[R] pipe data from plot(). was: ROCR.plot methods, cross validation averaging

David Winsemius dwinsemius at comcast.net
Thu Sep 24 15:25:39 CEST 2009


On Sep 24, 2009, at 9:09 AM, Tim Howard wrote:

> All,
> I'm trying again with a slightly more generic version of my first  
> question. I can extract the
> plotted values from hist(), boxplot(), and even plot.randomForest().  
> Observe:
>
> # get some data
> dat <- rnorm(100)
> # grab histogram data
> hdat <- hist(dat)
> hdat     #provides details of the hist output
>
> #grab boxplot data
> bdat <- boxplot(dat)
> bdat     #provides details of the boxplot output
>
> # the same works for randomForest
> library(randomForest)
> data(mtcars)
> RFdat <- plot(randomForest(mpg ~ ., mtcars, keep.forest=FALSE,  
> ntree=100), log="y")
> RFdat
>
>
> ##But, I can't use this method in ROCR
> library(ROCR)
> data(ROCR.xval)
> RCdat <- plot(perf, avg="threshold")

That code throws an object not found error. Perhaps you defined perf  
earlier?

David


>
> RCdat
> ## output:  NULL
>
> Does anyone have any tricks for piping or extracting these data?
> Or, perhaps for steering me in another direction?
>
> Thanks,
> Tim
>
>
> From: "Tim Howard" <tghoward at gw.dec.state.ny.us>
> Subject: [R] ROCR.plot methods, cross validation averaging
> To: <osander at mpi-sb.mpg.de>, <tobias.sing at mpi-sb.mpg.de>,
> 	<r-help at r-project.org>
> Message-ID: <4ABA1079.6D16.00D5.0 at gw.dec.state.ny.us>
> Content-Type: text/plain; charset=US-ASCII
>
> Dear R-help and ROCR developers (Tobias Sing and Oliver Sander) -
>
> I think my first question is generic and could apply to many methods,
> which is why I'm directing this initially to R-help as well as  
> Tobias and Oliver.
>
> Question 1. The plot function in ROCR will average your cross  
> validation
> data if asked. I'd like to use that averaged data to find a "best"  
> cutoff
> but I can't figure out how to grab the actual data that get plotted.
> A simple redirect of the plot (such as test <- plot(mydata)) doesn't  
> do it.
>
> Question 2. I am asking ROCR to average lists with varying lengths for
> each list entry. See my example below. None of the ROCR examples  
> have data
> structured in this manner. Can anyone speak to whether the averaging
> methods in ROCR allow for this? If I can't easily grab the data as  
> desired
> from Question 1, can someone help me figure out how to average the  
> lists,
> by threshold, similarly?
>
> Question 3. If my cross validation data happen to have a list entry  
> whose
> length = 2, ROCR errors out. Please see the second part of my example.
> Any suggestions?
>
> #reproducible examples exemplifying my questions
> ##part one##
> library(ROCR)
> data(ROCR.xval)
> # set up data so it looks more like my real data
> sampSize <- c(4, 55, 20, 75, 350, 250, 6, 120, 200, 25)
> testSet <- ROCR.xval
> # do the extraction
> for (i in 1:length(ROCR.xval[[1]])){
>  y <- sample(c(1:350),sampSize[i])
>  testSet$predictions[[i]] <- ROCR.xval$predictions[[i]][y]
>  testSet$labels[[i]] <- ROCR.xval$labels[[i]][y]
>  }
> # now massage the data using ROCR, set up for a ROC plot
> # if it errors out here, run the above sample again.
> pred <- prediction(testSet$predictions, testSet$labels)
> perf <- performance(pred,"tpr","fpr")
> # create the ROC plot, averaging by cutoff value
> plot(perf, avg="threshold")
> # check out the structure of the data
> str(perf)
> # note the ragged edges of the list and that I assume averaging
> # whether it be vertical, horizontal, or threshold, somehow
> # accounts for this?
>
> ## part two ##
> # add a list entry with only two values
> perf at x.values[[1]] <- c(0,1)
> perf at y.values[[1]] <- c(0,1)
> perf at alpha.values[[1]] <- c(Inf,0)
>
> plot(perf, avg="threshold")
>
> ##output results in an error with this message
> # Error in if (from == to) rep.int(from, length.out) else  
> as.vector(c(from,  :
> # missing value where TRUE/FALSE needed
>
>
> Thanks in advance for your help
> Tim Howard
> New York Natural Heritage Program
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list