[R] dx accuracy measures from raw data
Byron Dom
byron_dom at yahoo.com
Wed Jul 23 06:58:34 CEST 2014
Here is a partial answer (I think (?))
A common way to display results of this type is as a "receiver operating characteristic." See: http://en.wikipedia.org/wiki/Receiver_operating_characteristic
It's displayed as a parametric curve where the parameter is the threshold value, the x-value (abscissa) is the false-positive rate and the y value is the true-positive rate. Then, a commonly computed single-number characterization is to compute the area under this curve (AUC) for false-positive rate running from 0 to 1. There are variations on this but I've just described the standard one.
There are multiple R-packages that will do all of this for you. One of them is the pROC package. See http://cran.at.r-project.org/web/packages/pROC/pROC.pdf.
===============================================================
Date: Sun, 20 Jul 2014 18:28:12 +0100
From: Anoop Shah <anoopsshah at gmail.com>
To: r-help at r-project.org
Subject: [R] dx accuracy measures from raw data
Message-ID: <0E7574AF-9890-419E-AE9D-978860054AF2 at gmail.com>
Content-Type: text/plain
Hello R users!
I am a medic and have been working with R for about 6 months now.
I was hoping to pick someone’s brain about a diagnostic accuracy study that has now been completed.
I am trying to derive the sensitivity, specificity, NPV and PPV with the corresponding 95% CI from the raw data.
My data is in a data frame as below
g.s t1 t2 t3 t3 t4 t5 index
Yes 1 1 1 1 1 1 1
Yes 1 1 1 1 1 1 2
Yes 1 1 1 1 1 1 3
Yes 1 1 1 1 1 1 4
Yes 1 1 1 1 1 1 5
Each row represents a patient with a unique id (variable: index).
g.s is a binary variable ans represents the results from the gold standard (yes / no).
t1 to t5 are the tests at different thresholds being tested.
t1 to t5 are all binary variables with 1 as yes and 0 as no.
Now i could create separate 2 x 2 tables for each threshold (t1 to t5) against the gold standard and subsequently derive sense, spec, NPV and PPV plus their 95 % CI for each threshold (t1 to t5).
I was however wondering if there was a more efficient way to get these results from the raw data in R.
Hope I have explained my self clearly and thanks a lot in advance!!
Cheers
Anoop
[[alternative HTML version deleted]]
More information about the R-help
mailing list