[R] p-values for classification

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri Jul 1 14:01:05 CEST 2005

Not really an R question.

Most classifiers will produce predicted probabilities, and you can check 
their accuracy.  There are lots of details in my PRNN book, and some 
examples in MASS4.

I suggest you adjust your training and test sets to be more nearly equal, 
or use cross-validation.

I don't see how shuffling the labels will help: you want to know how well 
a classifier does when there is a real relationship between the 
explanatory variables and the class.  To take a simple example, suppose 
the classes are clearly linearly separable.  Then a logistic discriminant 
will have nigh-perfect performance on the actual data, but very poor 
performance on permuted labels.  You would do a lot better to simulate 
from a good fitted model, the so-called parametric bootstrapping.

On Fri, 1 Jul 2005 Arne.Muller at sanofi-aventis.com wrote:

> Dear All,
> I'm classifying some data with various methods (binary classification). 
> I'm interpreting the results via a confusion matrix from which I 
> calculate the sensitifity and the fdr. The classifiers are trained on 
> 575 data points and my test set has 50 data points.
> I'd like to calculate p-values for obtaining <=fdr and >=sensitifity for 
> each classifier. I was thinking about shuffling/bootstrap the lables of 
> the test set, classify them and calculating the p-value from the 
> obtained normal distributed random fdr and sensitifity.
> The problem is that it's rather slow when running many rounds of 
> shuffling/classification (I'd like to do this for many classifiers and 
> parameter combinations). In addition classification of the 50 test data 
> points with shuffled lables realistically produces only a very limited 
> number of possible fdr's and sensitivities, and I'm wondering if I can 
> realy believe these values to be normal.
> Basically I'm looking for a way to calculate the p-values analytically. 
> I'd be happy for any suggestions, web-addresses or references.
> 	kind regads,
> 	Arne
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

More information about the R-help mailing list