[R] "exact" p-values
Therneau, Terry M., Ph.D.
therne@u @end|ng |rom m@yo@edu
Sat Mar 20 13:43:30 CET 2021
I am late to this discussion -- I read R-help as a once-a-day summary. A few comments.
1. In the gene-discovery subfield of statistics (SNP studies, etc.) there is a huge
multiple-testing problem. In defense, the field thinks in terms of thresholds like 1e-5
or 1e-10 rather than the .05 or .01 most of us are used to. In that literature, they do
care about 1e-16 vs 1e-20. We can all argue about whether that is a sensible approach
or not, but it is what it is. I think that this is the context of the journal's request,
i.e., they want the actual number, however you calculate it.
My own opinion is that these rarified p-values are an arbitrary scale, one that no longer
has a probability interpretation. For the central limit theorem to be correct that far
from the mean requires a sample size that is beyond imagination (`number of atoms in the
earth' order of size). Such a scale may still be useful, but it's not really a probability.
2. The label of "Fisher's exact test" has caused decades of confusion. In this context
the word means "a particular test whose distribution can be completely enumerated": it
does not mean either "correct" or "precise". The original enumeration methods had
limitations with resspect to the sample size or the presence of complications such as tied
values; from the discussion so far it would appear that the 'exact' argument of
wilcox.test uses such a method. Cyrus Mehta did nice work on improved algorithms that do
not have these restrictions, methods that have been refiined and expanded in the software
offerings from Cytel among others. Perhaps someone could update R's code to use this, but
see 3 below.
My own opinion is that permutation tests are an important tool, one "wrench" in our
statistical toolbox. But they are only one tool out of many. I am quite put off by
arguments that purposefully conflate "exact" and "correct".
3. The concordance statistic C, the Wilcoxon test, and Somer's d are all the same
statistic, just written a little differently. (Somer's d is essentially Kendalls' tau, but
with a slightly different rule for ties). A test for C=.5 is the same as a Wilcoxon. For
a binary response C = the area under the reciever operating curve (AUC). The concordance
command in the surivival library computes this statistic for continuous, binary, or
censored responses. The variance is based on a jackknife argument, and is computed by
organizing the data into a binary tree structure, very similar to the methods used by
Mehta, is efficient for large n and is valid for ties. Perhaps add a link in the
wilcox.test help page?
Footnote: AUC is a special case of C but not vice versa. People sometimes try to extend
AUC to the other data types, but IMHO with only moderate success.
Terry M Therneau, PhD
Department of Health Science Research
therneau using mayo.edu
[[alternative HTML version deleted]]
More information about the R-help