[BioC] Seeking assistance on ROC

Sean Davis seandavi at gmail.com
Tue Jan 19 12:53:41 CET 2010


On Tue, Jan 19, 2010 at 1:51 AM, Susan Bosco <susanbosco86 at yahoo.com> wrote:
> Dear friends,
>
> I have been trying to perform ROC analysis with ROCR. Since there is not much support for the queries I have switched to ROC package of Bioconductor.
>
> I'm trying to perform ROC analysis on Methylation data obtained from MeDiP experiment.The data set has values ranging from 0 to as large as 5.
>
> I have a couple of doubts with ROC as I had with ROCR.
> 1. When provided with different threshold values such as 0.6,0.7,0.8,0.9,each time I've got a plot which has the same curve as shown in the attached pdf.There's no change whatsoever in the curve with the different thresholds applied.Is the result
>  what i'm getting on the data set appropriate?(I've come across research papers with ROC analysis being implemented on Methylation data)
>
> 2. As ROC provides knowldege about the cut-off value for micro array data,while assigning a cut-off value,should one take into account the value of threshold given in ROC or the accuracy value?
>
> Following is my sessional info.
>
> load("RGKma.RData")
> state <-ifelse(RGKma$M[1:100,3] > 0.9, 1,0)
> print("RGKma$M:");print(RGKma$M[1:100,3])
> print("state:");print(state)
> data<-RGKma$M[1:100,3]
> R1<-rocdemo.sca(truth=state,data,dxrule.sca)
> pdf("rocK.pdf")
> plot(R1,col = "red")
> dev.off()
> print("ROC(R1):");print(ROC(R1,.3))
>
> [1] "RGKma$M:"
>   [1]  2.10538709 -0.07335174  2.13920582  0.18499421  3.30846203  1.69065450
>   [7]  4.24969667  1.37415619  1.65769067  5.39253767  1.19349192  5.40321575
>  [13]  3.06468274  1.34311072  0.68093156  4.03579639  2.91909842  3.36384055
>  [19]  3.54968030  4.06977722  2.31968962  3.17237025  2.80040216  3.01874372
>  [25]  1.89894809  4.17251372 -0.92690849  2.72505883  1.10609889  2.33584882
>  [31]  0.09886450  3.30066347  2.66466248  1.39238431  2.38782229  4.19572478
>  [37]  3.97185357  0.38627851 -0.09439237 -0.22948185  3.45955944  0.64538744
>  [43]  1.02627932 -0.53789425  4.17758537  2.87612185  3.25867248  1.89058878
>  [49]  2.71612450  3.06751911  2.63941028  1.03250743  2.07739372 -0.11727572
>  [55]  3.66338130  2.52249841  0.05683122  1.90834958  4.25784185  1.87577855
>  [61]  0.21814006  0.98911168  1.63475517  4.57600122  0.99326629  1.86706117
>  [67]  1.27215099  2.23056201 -0.81404957  1.12010588  1.62733217  0.41223049
>  [73]  3.43584658  3.78533569  2.33141286  3.15227631  1.51317488  3.37017353
>  [79] -0.57605695  2.96351684  2.82082253  2.85149236  1.43692942 -0.49898928
>  [85] -0.81504931 -0.75064053  1.11314716  2.51744122  2.49526189 -1.17086212
>  [91]  1.11677841  0.51370382  3.24834409  0.40958307  0.39834589  1.28139084
>  [97]  1.24613108  3.91323816  2.06097801  2.88980181
> [1] "state:"
>   [1] 1 0 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1
>  [38] 0 0 0 1 0 1 0 1
>  1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 0 1 1 1 1 1 1 1 0 1 1 0 1 1
>  [75] 1 1 1 1 0 1 1 1 1 0 0 0 1 1 1 0 1 0 1 0 0 1 1 1 1 1
> [1] "ROC(R1):"
> [1] 1

Hi, Susan.

Your code is fine and the ROC package is doing as advertised, I think.
 There may be a bit of a misunderstanding of what an ROC curve
represents.  An ROC curve presents the False Positive Rate versus the
True Positive Rate across the thresholds in the data (that is, ALL
thresholds).  It can be used to pick the best threshold from all
possible thresholds.  Try your plot function like so:

plot(R1,show.thresh=TRUE)

For your data, if the threshold is below 0.9, you will never see a
false positive in your data and the true positive rate will simply
increase, resulting in the vertical line at x=0.  As soon as the
threshold passes 0.9, you have captured all the true positives and
everything else is a false positive, resulting in the rest of the
plot, a horizontal line at y=1.  Try this:

state[1:10]=1

and do the plot again.  Since the data are now not perfectly described
by a threshold rule (there is some noise), you will see a different
plot.

Let us know if there are other questions.

Sean



More information about the Bioconductor mailing list