[BioC] Seeking assistance on ROC

Sat Jan 23 12:28:07 CET 2010

Dear Sean,

Thanks again.

I corrected the script changing the value of 'truth' variable with rbinom() function. Since my data size is quite large(data is of 244K),I tried with the first 200,for which I was able to find proper ROC curve. However, when I include the complete data, the plot changes. For the whole data,I get
 a linear graph with small variations.

My sessionInfo() looks like this:
For 100 values of the data:
library(ROC)
load("RGKma.RData")
state= rbinom(length(RGKma$M[1:100,3]),1,0.33)
data = RGKma$M[1:200,3]
R1<-rocdemo.sca(truth=state,data,dxrule.sca)
pdf("ROCk.pdf")
plot(R1, show.thresh=TRUE,col = "red")
dev.off()

For the complete data:
library(ROC)
load("RGKma.RData")
state= rbinom(length(RGKma$M[,3]),1,0.33)
data = RGKma$M[,3]
R1<-rocdemo.sca(truth=state,data,dxrule.sca)
pdf("ROCallk.pdf")
plot(R1, show.thresh=TRUE,col = "red")
dev.off()

I've hereby attached the pdfs of the plots.I would appreciate if you could help me out with this problem that I encountered with a large data size.

Thanking you sincerely,
Susan.

--- On Wed, 20/1/10, Sean Davis
 <seandavi at gmail.com> wrote:

From: Sean Davis
 <seandavi at gmail.com>
Subject: Re: [BioC] Seeking assistance on ROC
To: "Susan Bosco" <susanbosco86 at yahoo.com>
Cc: bioconductor at stat.math.ethz.ch, "prashantha hebbar" <prashantha.hebbar at manipal.edu>
Date: Wednesday, 20 January, 2010, 12:05 PM

On Wed, Jan 20, 2010 at 12:39 AM, Susan Bosco <susanbosco86 at yahoo.com> wrote:

Dear
 Sean,

Thank you so much for  the help.

I tried with a range of thresholds from 0-0.9..As you had mentioned,the
true positive rates no doubt increased with thresholds below
0.9.However I did get some false positive rates even at a minimum threshold
of 0.1.Could you kindly explain the reason?

Is
there any method of finding the optimal threshold,maximizing the true
positive rates while minimizing the false positives,instead of randomly
choosing between 0-0.9?

Hi, Susan.  The ROC curve IS that method.  The ROC curve represents ALL thresholds as applied to the data.  If you plot with show.thresh=TRUE, you will see the thresholds that were tried and where they are on the curve.  

If the threshold to which you are referring is the one that you used to determine the variable you called "state", then we are talking about two different things.  The "truth" variable is meant to be assigned by some source other than the data themselves.  If you do not know the true state of your samples and find yourself assigning the state the data, then ROC curve analysis will not be of any use.

Sean

Thanks in advance,

Susan.

The INTERNET now has a personality. YOURS! See your Yahoo! Homepage.

      Your Mail works best with the New Yahoo Optimized IE8. Get it NOW! http://downloads.yahoo.com/in/internetexplorer/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ROC_K.pdf
Type: application/pdf
Size: 10758 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20100123/5dcd5abd/attachment-0002.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ROC_allK.pdf
Type: application/pdf
Size: 17944800 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20100123/5dcd5abd/attachment-0003.pdf>