[R] Survey of ROC AUC / wilcoxon test functions

Tuszynski, Jaroslaw W. JAROSLAW.W.TUSZYNSKI at saic.com
Thu Sep 22 20:59:52 CEST 2005


Hi,

I was lately debugging parts of my 'colAUC' function in caTools package, and
in a process looked into other packages for calculating Areas Under ROC
Curves (AUC).  To my surprise I found at least 6 other functions: 
*	  wilcox.test
*	  AUC from ROC package, 
*	  performance from ROCR package, 
*	  auROC from limma package, 
*	  ROC from Epi package, 
*	  roc.area from verification package
And I belive I found problems with 3 of them: 'ROC', 'auROC' & 'roc.area' 

I wrote short code to compare them all:

	# Compare calAUC with other functions designed for similar purpose
	library(caTools)
	library(verification)
	library(ROC)
	library(ROCR)
	library(Epi)
	library(limma)
	library(MASS) 
	data(cats)
	colAUC(cats[,2:3], cats[,1], plotROC=TRUE) 
	auc  = matrix(NA,9,3)
      pval = matrix(NA,4,3)
	rownames(auc)  = c("colAUC(alg='ROC')", "colAUC(alg='Wilcox')",
"wilcox.test",
	                   "rank", "roc.area", "AUC", "performance", "ROC",
"auROC")
	colnames(auc ) = c("AUC(x)", "AUC(-x)", "AUC(x+noise)")
	rownames(pval) = c("wilcox.test(exact=1)","wilcox.test(exact=0)", 
                         "roc.area()$p.adj","roc.area()$p")
      colnames(pval) = c("p(x)", "p(-x)", "p(x+noise)")
	X = cbind(cats[,2], -cats[,2], cats[,2]+rnorm(nrow(cats))/10 )
	y = ifelse(cats[,1]=='F',0,1)
	for (i in 1:3) {
	  x = X[,i]
	  x1 = x[y==1]; n1 = length(x1);            # prepare input data ...
	  x2 = x[y==0]; n2 = length(x2);            # ... into required
format
	  r = rank(c(x1,x2))  
	  auc[1,i] = colAUC(x, y, alg="ROC") 
	  auc[2,i] = colAUC(x, y, alg="Wilcox")       
	  auc[3,i] = wilcox.test(x1, x2, exact=0)$statistic / (n1*n2)
        auc[4,i] = (sum(r[1:n1]) - n1*(n1+1)/2) / (n1*n2) 
	  auc[5,i] = roc.area(y, x)$A.tilda           
	  auc[6,i] = AUC(rocdemo.sca(y, x, dxrule.sca))    
	  auc[7,i] = performance(prediction( x, y),"auc")@y.values[[1]]
	  auc[8,i] = ROC(x,y,grid=0)$AUC # get AUC by 'ROC'
	  auc[9,i] = auROC(y, x)   # get AUC by 'auROC'
	  pval[1,i] = wilcox.test(x1, x2, exact=0)$p.value
	  pval[2,i] = wilcox.test(x1, x2, exact=1)$p.value

        pval[3,i] = roc.area(y, x)$p.adj
        pval[4,i] = roc.area(y, x)$p	  	      	  	      
      }
	print(auc)
      print(pval)

    

Which gave the following results:

	> print(auc)
   	                     AUC(x)     AUC(-x)    AUC(x+noise)
	colAUC(alg='ROC')    0.8338451  0.8338451  0.8225488
	colAUC(alg='Wilcox') 0.8338451  0.8338451  0.8225488
	wilcox.test          0.8338451  0.1661549  0.8225488
	rank                 0.8338451  0.1661549  0.8225488
	roc.area             0.8338451  0.1661549  0.8225488
	AUC                  0.8338451  0.1661549  0.8225488
	performance          0.8338451  0.1661549  0.8225488
	ROC                  0.8338451  0.1654968  0.8225488
	auROC                0.8131169  0.1454266  0.8225488
	>       print(pval)
	                     p(x)       p(-x)      p(x+noise)
	wilcox.test(exact=1) 8.200e-11  8.200e-11  3.774e-10
	wilcox.test(exact=0) 8.200e-11  8.200e-11  3.817e-11
	roc.area()$p.adj     4.042e-11  1.000e+00  1.861e-10
	roc.area()$p         4.446e-11  1.000e+00  1.861e-10


Some thoughts about those results:
- Data used for the testing: 
    column 1 (x) - has ties, 
    column 2 (-x) - negative of the same data
    column 3 (x+noise) - similar data but with no ties
- All AUC functions gave the same results for data with no ties
- 'ROC' and 'auROC' gave different results for data with ties
- 'colAUC' (my function) returns 'max(auc, 1-auc)' that's why AUC(x) and
AUC(-x) are the same
- I assume that "roc.area()$p.adj" and "wilcox.test" p.values suppose to
return the same results, but they are different.
- At least 'roc.area(x)$p.adj' and 'roc.area(-x)$p.adj' should be the same.


Comments? Corrections?

Jarek 
====================================================\====                 
 Jarek Tuszynski, PhD.                           o / \ 
 Science Applications International Corporation  <\__,|  
 (703) 676-4192                                   ">   \ 
 Jaroslaw.W.Tuszynski at saic.com                    `     \




More information about the R-help mailing list