[R] p-values for classification

Arne.Muller@sanofi-aventis.com Arne.Muller at sanofi-aventis.com
Fri Jul 1 12:14:20 CEST 2005

Dear All,

I'm classifying some data with various methods (binary classification). I'm interpreting the results via a confusion matrix from which I calculate the sensitifity and the fdr. The classifiers are trained on 575 data points and my test set has 50 data points.

I'd like to calculate p-values for obtaining <=fdr and >=sensitifity for each classifier. I was thinking about shuffling/bootstrap the lables of the test set, classify them and calculating the p-value from the obtained normal distributed random fdr and sensitifity.

The problem is that it's rather slow when running many rounds of shuffling/classification (I'd like to do this for many classifiers and parameter combinations). In addition classification of the 50 test data points with shuffled lables realistically produces only a  very limited number of possible fdr's and sensitivities, and I'm wondering if I can realy believe these values to be normal.

Basically I'm looking for a way to calculate the p-values analytically. I'd be happy  for any suggestions, web-addresses or references.

	kind regads,


More information about the R-help mailing list