[R] p-values < 2.2e-16 not reported

Will Eagle will.eagle at gmx.net
Thu May 20 18:16:09 CEST 2010


> On 2010-05-20 08:52, Shi, Tao wrote:
>> Will,
>>
>>   I'm wondering if you have any
>> insights after looking at the cor.test source code.  It seems to be fine to me, as the p value is either calculated by "your first method" or a
>> .C code.
>>
>> ...Tao

Dear Tao,

I think the described problem of p-values < 2.2e-16 not being reported 
(or set to zero, respectively) applies at least to the cor.test(, 
method="pearson", alternative=c("greater","two.sided")) function, but 
also other statistical test functions. You find the critical piece of 
code with methods(cor.test) and getAnywhere(cor.test.default) in the 
following lines:
...
p <- pt(STATISTIC, df) # line 27
...
PVAL <- switch(alternative,  # lines 144-145, reformatted
less = p,
greater = 1 - p,
two.sided = 2 * min(p, 1 - p))

This means that if you calculate a pearson correlation (which is the 
default) with option alternativ="two.sided" (which is the default) or 
alternative="greater", an additive operation of the 2nd kind is 
performed, which causes that a very small p-value is set to zero. This 
problem should not apply to alternative="less" since the p-value has 
been generated without an additive operation with the pt() function.

What I still find confusing is that:

 > pt(10,100,lower=FALSE)
[1] 4.950844e-17
 > pt(10,100,lower=TRUE)   # should give 1 - 4.950844e-17
[1] 1
 > 1-pt(10,100,lower=TRUE) # should give 4.950844e-17
[1] 0

So it seems that this distribution function(s) only give the exact value 
if the option lower.tail=FALSE is used. Therefore, also cor.test(..., 
method="pearson", alternative="less") might not report p-values < 
2.2e-16. I find this behavior of R very unintuitive.

The .C code routines are by the way only used for methods=c("kendall", 
"spearman") as far as I have seen.

Best, Will



More information about the R-help mailing list