[R] ROCR.plot methods, cross validation averaging

Wed Sep 23 18:11:37 CEST 2009

Dear R-help and ROCR developers (Tobias Sing and Oliver Sander) - 

I think my first question is generic and could apply to many methods, 
which is why I'm directing this initially to R-help as well as Tobias and Oliver.

Question 1. The plot function in ROCR will average your cross validation
data if asked. I'd like to use that averaged data to find a "best" cutoff
but I can't figure out how to grab the actual data that get plotted.
A simple redirect of the plot (such as test <- plot(mydata)) doesn't do it.

Question 2. I am asking ROCR to average lists with varying lengths for
each list entry. See my example below. None of the ROCR examples have data
structured in this manner. Can anyone speak to whether the averaging
methods in ROCR allow for this? If I can't easily grab the data as desired
from Question 1, can someone help me figure out how to average the lists,
by threshold, similarly?

Question 3. If my cross validation data happen to have a list entry whose
length = 2, ROCR errors out. Please see the second part of my example.
Any suggestions?

#reproducible examples exemplifying my questions
##part one##
library(ROCR)
data(ROCR.xval)
 # set up data so it looks more like my real data
sampSize <- c(4, 55, 20, 75, 350, 250, 6, 120, 200, 25)
testSet <- ROCR.xval
 # do the extraction
for (i in 1:length(ROCR.xval[[1]])){
  y <- sample(c(1:350),sampSize[i])
  testSet$predictions[[i]] <- ROCR.xval$predictions[[i]][y]
  testSet$labels[[i]] <- ROCR.xval$labels[[i]][y]
  }
 # now massage the data using ROCR, set up for a ROC plot
 # if it errors out here, run the above sample again.
pred <- prediction(testSet$predictions, testSet$labels)
perf <- performance(pred,"tpr","fpr")
 # create the ROC plot, averaging by cutoff value
plot(perf, avg="threshold")
 # check out the structure of the data
str(perf)
 # note the ragged edges of the list and that I assume averaging
 # whether it be vertical, horizontal, or threshold, somehow 
 # accounts for this?

## part two ##
# add a list entry with only two values
perf at x.values[[1]] <- c(0,1)
perf at y.values[[1]] <- c(0,1)
perf at alpha.values[[1]] <- c(Inf,0)

plot(perf, avg="threshold")

##output results in an error with this message
# Error in if (from == to) rep.int(from, length.out) else as.vector(c(from,  :
# missing value where TRUE/FALSE needed

Thanks in advance for your help
Tim Howard
New York Natural Heritage Program