[R] Query about wilcox.test() P-value

Wed Jul 14 18:05:35 CEST 2010

You need to understand the difference between how a value is stored in an R object with full floating point precision versus how a value in R is displayed (printed) in the console with a print "method".

In this case, wilcox.test() returns an object of class 'htest' (as noted in the Value section of ?wilcox.test). When the result of wilcox.test() is printed to the console (using print.htest()), the p value is displayed using the function format.pval(), which in this case returns:

> format.pval(2.928121e-165)
[1] "< 2.22e-16"

This is common in R, where floating point values are not printed to full precision. The value displayed will be impacted upon by various characteristics, in some cases due to the application of specific print/formatting operations, or due to default options in R (see ?print.default).

You might also want to look at ?.Machine which will provide other information specific to your platform relative to numerical characteristics.

HTH,

Marc Schwartz

On Jul 14, 2010, at 10:49 AM, Govind Chandra wrote:

> Hi Peter,
> 
> Thanks for your response. Yes, I am interested in P-values smaller
> than 1e-16. Below a certain value they may not tell much about
> significance but are useful for ordering (ranking), for example,
> differentially expressed genes in microarray data.  Something similar
> is done by sequence similarity searching tools such as BLAST (although
> they use expect values not P-values) to rank hits to a database. To me
> this is practically useful and harmless.
> 
> I am not a statistician but I use statistics and wish to avoid
> misusing it unknowingly or knowingly. Hence the query.
> 
> I would still like to know why there is this difference between
> the P-value printed on the console and that stored in the returned
> object.
> 
> Govind
> 
> 
> 
> On Wed, Jul 14, 2010 at 02:32:39PM +0100, Peter Ehlers wrote:
>> On 2010-07-14 3:53, Govind Chandra wrote:
>>> Hi,
>>> 
>>> I find that the p-value printed out by wilcox.test() and the p-value
>>> stored in the p.value attribute in the object returned by
>>> wilcox.test() are not the same. There seems to be a lower limit of
>>> 2.2e-16 for the printed value although it does say that it is less
>>> than that. What I want to know is the reason for the lower limit in
>>> the printed value of p-value and also whether I am doing the right
>>> thing by picking up the p-value from the p.value attribute of the
>>> returned object. An example R session is pasted below (although the
>>> test is probably not the right one for the kind of data).
>>> 
>>>>  x<- rnorm(500, mean = 30, sd = 3);
>>>>  y<- rnorm(500, mean = 8000, sd = 6);
>>>>  wilcox.test(x, y, alternative = "l");
>>> 
>>>         Wilcoxon rank sum test with continuity correction
>>> 
>>> data:  x and y
>>> W = 0, p-value<  2.2e-16
>>> alternative hypothesis: true location shift is less than 0
>>> 
>>>>  wt<- wilcox.test(x, y, alternative = "l");
>>>>  wt$p.value;
>>> [1] 2.928121e-165
>> 
>> Are you really interested in P-values smaller than 10^(-16)?
>> Why? A reported P-value of 3e-165 is certainly not accurate
>> to 165 decimal places and should perhaps be reported as zero,
>> as t.test() does.
>> 
>> As to your example: there is no sense at all in doing a
>> test on such data (other than to satisfy some hypothetical
>> fanatical journal editor).
>> 
>>   -Peter Ehlers
>> 
>> 
>>> 
>>> My version for R is 2.11.1 (2010-05-31) running on x86_64 GNU/Linux
>>> (RHEL).
>>> 
>>> Thanks in advance for any help with this.
>>> 
>>> Govind