[R] "exact" p-values

Therneau, Terry M., Ph.D. therne@u @end|ng |rom m@yo@edu
Sat Mar 20 13:43:30 CET 2021


I am late to this discussion -- I read R-help as a once-a-day summary.  A few comments.

1. In the gene-discovery subfield of statistics (SNP studies, etc.)  there is a huge 
multiple-testing problem.  In defense, the field thinks in terms of thresholds like 1e-5 
or 1e-10 rather than the .05 or .01 most of us are used to.   In that literature, they do 
care about  1e-16 vs 1e-20.    We can all argue about whether that is a sensible approach 
or not, but it is what it is.  I think that this is the context of the journal's request, 
i.e., they want the actual number, however you calculate it.

My own opinion is that these rarified p-values are an arbitrary scale, one that no longer 
has a probability interpretation.   For the central limit theorem to be correct that far 
from the mean requires a sample size that is beyond imagination  (`number of atoms in the 
earth' order of size).   Such a scale may still be useful, but it's not really a probability.

2. The label of "Fisher's exact test" has caused decades of confusion.  In this context 
the word means "a particular test whose distribution can be completely enumerated": it 
does not mean either "correct" or "precise".  The original enumeration methods had 
limitations with resspect to the sample size or the presence of complications such as tied 
values;  from the discussion so far it would appear that the 'exact' argument of 
wilcox.test uses such a method.   Cyrus Mehta did nice work on improved algorithms that do 
not have these restrictions, methods that have been refiined and expanded in the software 
offerings from Cytel among others. Perhaps someone could update R's code to use this, but 
see 3 below.

My own opinion is that permutation tests are an important tool, one "wrench" in our 
statistical toolbox.   But they are only one tool out of many.  I am quite put off by 
arguments that purposefully conflate "exact" and "correct".

3. The concordance statistic C, the Wilcoxon test, and Somer's d are all the same 
statistic, just written a little differently. (Somer's d is essentially Kendalls' tau, but 
with a slightly different rule for ties).  A test for C=.5 is the same as a Wilcoxon.  For 
a binary response C = the area under the reciever operating curve (AUC).   The concordance 
command in the surivival library computes this statistic for continuous, binary, or 
censored responses.    The variance is based on a jackknife argument, and is computed by 
organizing the data into a binary tree structure, very similar to the methods used by 
Mehta, is efficient for large n and is valid for ties.   Perhaps add a link in the  
wilcox.test help page?

Footnote: AUC is a special case of C but not vice versa.  People sometimes try to extend 
AUC to the other data types, but IMHO with only moderate success.

-- 
Terry M Therneau, PhD
Department of Health Science Research
Mayo Clinic
therneau using mayo.edu

"TERR-ree THUR-noh"


	[[alternative HTML version deleted]]



More information about the R-help mailing list