[R] dx accuracy measures from raw data

Byron Dom byron_dom at yahoo.com
Wed Jul 23 06:58:34 CEST 2014

Here is a partial answer (I think (?))

A common way to display results of this type is as a "receiver operating characteristic." See: http://en.wikipedia.org/wiki/Receiver_operating_characteristic

It's displayed as a parametric curve where the parameter is the threshold value, the x-value (abscissa) is the false-positive rate and the y value is the true-positive rate. Then, a commonly computed single-number characterization is to compute the area under this curve (AUC) for false-positive rate running from 0 to 1. There are variations on this but I've just described the standard one.

There are multiple R-packages that will do all of this for you. One of them is the pROC package. See http://cran.at.r-project.org/web/packages/pROC/pROC.pdf.

Date: Sun, 20 Jul 2014 18:28:12 +0100
From: Anoop Shah <anoopsshah at gmail.com>
To: r-help at r-project.org
Subject: [R] dx accuracy measures from raw data
Message-ID: <0E7574AF-9890-419E-AE9D-978860054AF2 at gmail.com>
Content-Type: text/plain

Hello R users!

I am a medic and have been working with R for about 6 months now.

I was hoping to pick someone’s brain about a diagnostic accuracy study that has now been completed.

I am trying to derive the sensitivity, specificity, NPV and PPV with the corresponding 95% CI from the raw data.

My data is in a data frame as below

g.s    t1    t2    t3    t3    t4    t5    index
Yes    1    1    1    1    1    1    1
Yes    1    1    1    1    1    1    2
Yes    1    1    1    1    1    1    3
Yes    1    1    1    1    1    1    4
Yes    1    1    1    1    1    1    5

Each row represents a patient with a unique id (variable: index).

g.s is a binary variable ans represents the results from the gold standard (yes / no).

t1 to t5 are the tests at different thresholds being tested.

t1 to t5 are all binary variables with 1 as yes and 0 as no.

Now i could create separate 2 x 2 tables for each threshold (t1 to t5) against the gold standard and subsequently derive sense, spec, NPV and PPV plus their 95 % CI for each threshold (t1 to t5).

I was however wondering if there was a more efficient way to get these results from the raw data in R.

Hope I have explained my self clearly and thanks a lot in advance!!


    [[alternative HTML version deleted]]

More information about the R-help mailing list