[R] binning results

Steve Lianoglou mailinglist.honeypot at gmail.com
Wed Aug 5 20:22:50 CEST 2009


Hi,

On Aug 5, 2009, at 2:11 PM, Noah Silverman wrote:

> Hello,
>
> I asked this as part of a previous message, but never really figured  
> out a usable solution.  So this is a second attempt.
>
> I have an process containing an SVM.  The end result is the  
> probability that the class is true.  That result is added back to  
> the original data.
>
> So I wind up with a data.frame that looks like this
>
> label,v1,v2,v3,prob_true
>
> What I want to do is measure how accurate my model is for each range  
> of probability.  (I've seen this done is a few published papers and  
> found it a very useful way to visualize things.)
>
> My hope/guess is that there is some kind of package for R that does  
> this since it should be a common need.
>
> Here is an example of what I'd like to be able to generate:
>
> range        number of items        mean(probability)   true_accuracy
> 100-90%        20                            . 
> 924                    .90
> 90-80%          50                            . 
> 825                    .84
> 80-70%          214                          . 
> 75                      .71
> etc...
>
> range is the range of predicted values by the SVM
> mean(probability) is the mean of the PREDICTED probability of items  
> in that range
> true_accuracy is the mean of the ACTUAL probability of items in that  
> range.
>
> In English I would explain it as, "Of the data where our SVM  
> predicted a true probability of 70-80%, the data was actually 71%  
> true."
>
> It might be really  helpful to be able to graph this somehow.   
> (Again, There must be some package in R for this??)
> With mean(predicted_probability) on one axis and  
> mean(true_probability) on the other axis.
>
> Any thoughts, comments, ideas, etc. would be appreciated!

Take a look at the cut function, and the code in the examples of ?cut  
(eg, take a look at the output when combined w/ table(cut(..)) ).

Sending in your own vector for the ``breaks`` param inorder to bin as  
you like should get you 90% of the way to building the table you're  
after.

-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
   |  Memorial Sloan-Kettering Cancer Center
   |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact




More information about the R-help mailing list