[R] p-values for classification
Prof Brian Ripley
ripley at stats.ox.ac.uk
Fri Jul 1 14:01:05 CEST 2005
Not really an R question.
Most classifiers will produce predicted probabilities, and you can check
their accuracy. There are lots of details in my PRNN book, and some
examples in MASS4.
I suggest you adjust your training and test sets to be more nearly equal,
or use cross-validation.
I don't see how shuffling the labels will help: you want to know how well
a classifier does when there is a real relationship between the
explanatory variables and the class. To take a simple example, suppose
the classes are clearly linearly separable. Then a logistic discriminant
will have nigh-perfect performance on the actual data, but very poor
performance on permuted labels. You would do a lot better to simulate
from a good fitted model, the so-called parametric bootstrapping.
On Fri, 1 Jul 2005 Arne.Muller at sanofi-aventis.com wrote:
> Dear All,
> I'm classifying some data with various methods (binary classification).
> I'm interpreting the results via a confusion matrix from which I
> calculate the sensitifity and the fdr. The classifiers are trained on
> 575 data points and my test set has 50 data points.
> I'd like to calculate p-values for obtaining <=fdr and >=sensitifity for
> each classifier. I was thinking about shuffling/bootstrap the lables of
> the test set, classify them and calculating the p-value from the
> obtained normal distributed random fdr and sensitifity.
> The problem is that it's rather slow when running many rounds of
> shuffling/classification (I'd like to do this for many classifiers and
> parameter combinations). In addition classification of the 50 test data
> points with shuffled lables realistically produces only a very limited
> number of possible fdr's and sensitivities, and I'm wondering if I can
> realy believe these values to be normal.
> Basically I'm looking for a way to calculate the p-values analytically.
> I'd be happy for any suggestions, web-addresses or references.
> kind regads,
> R-help at stat.math.ethz.ch mailing list
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help