[R] about a p-value < 2.2e-16

Spencer Graves @pencer@gr@ve@ @end|ng |rom e||ect|vede|en@e@org
Fri Mar 19 15:57:08 CET 2021



On 2021-3-19 9:52 AM, Jiefei Wang wrote:
> After digging into the R source, it turns out that the argument `exact` has
> nothing to do with the numeric precision. It only affects the statistic
> model used to compute the p-value. When `exact=TRUE` the true distribution
> of the statistic will be used. Otherwise, a normal approximation will be
> used.
>
> I think the documentation needs to be improved here, you can compute the
> exact p-value *only* when you do not have any ties in your data. If you
> have ties in your data you will get the p-value from the normal
> approximation no matter what value you put in `exact`. This behavior should
> be documented or a warning should be given when `exact=TRUE` and ties
> present.
>
> FYI, if the exact p-value is required, `pwilcox` function will be used to
> compute the p-value. There are no details on how it computes the pvalue but
> its C code seems to compute the probability table, so I assume it computes
> the exact p-value from the true distribution of the statistic, not a
> permutation or MC p-value.


       My example shows that it does NOT use Monte Carlo, because 
otherwise it uses some distribution.  I believe the term "exact" means 
that it uses the permutation distribution, though I could be mistaken.  
If it's NOT a permutation distribution, I don't know what it is.


       Spencer
>
> Best,
> Jiefei
>
>
>
> On Fri, Mar 19, 2021 at 10:01 PM Jiefei Wang <szwjf08 using gmail.com> wrote:
>
>> Hey,
>>
>> I just want to point out that the word "exact" has two meanings. It can
>> mean the numerically accurate p-value as Bogdan asked in his first email,
>> or it could mean the p-value calculated from the exact distribution of the
>> statistic(In this case, U stat). These two are actually not related, even
>> though they all called "exact".
>>
>> Best,
>> Jiefei
>>
>> On Fri, Mar 19, 2021 at 9:31 PM Spencer Graves <
>> spencer.graves using effectivedefense.org> wrote:
>>
>>>
>>> On 2021-3-19 12:54 AM, Bogdan Tanasa wrote:
>>>> thanks a lot, Vivek ! in other words, assuming that we work with 1000
>>> data
>>>> points,
>>>>
>>>> shall we use EXACT = TRUE, it uses the normal approximation,
>>>>
>>>> while if EXACT=FALSE (for these large samples), it does not ?
>>>
>>>         As David Winsemius noted, the documentation is not clear.
>>> Consider the following:
>>>
>>>> set.seed(1)  > x <- rnorm(100) > y <- rnorm(100, 2) > > wilcox.test(x,
>>> y)$p.value
>>> [1] 1.172189e-25 > wilcox.test(x, y)$p.value [1] 1.172189e-25 > >
>>> wilcox.test(x, y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x,
>>> y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
>>> exact=TRUE)$p.value [1] 4.123875e-32 > wilcox.test(x, y,
>>> exact=TRUE)$p.value [1] 4.123875e-32 > > wilcox.test(x, y,
>>> EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
>>> EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
>>> exact=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
>>> exact=FALSE)$p.value [1] 1.172189e-25 > We get two values here:
>>> 1.172189e-25 and 4.123875e-32. The first one, I think, is the normal
>>> approximation, which is the same as exact=FALSE. I think that with
>>> exact=FALSE, you get a permutation distribution, though I'm not sure.
>>> You might try looking at "wilcox_test in package coin for exact,
>>> asymptotic and Monte Carlo conditional p-values, including in the
>>> presence of ties" to see if it is clearer. NOTE: R is case sensitive, so
>>> "EXACT" is a different variable from "exact". It is interpreted as an
>>> optional argument, which is not recognized and therefore ignored in this
>>> context.
>>>            Hope this helps.
>>>            Spencer
>>>
>>>
>>>> On Thu, Mar 18, 2021 at 10:47 PM Vivek Das <vd4mmind using gmail.com> wrote:
>>>>
>>>>> Hi Bogdan,
>>>>>
>>>>> You can also get the information from the link of the Wilcox.test
>>> function
>>>>> page.
>>>>>
>>>>> “By default (if exact is not specified), an exact p-value is computed
>>> if
>>>>> the samples contain less than 50 finite values and there are no ties.
>>>>> Otherwise, a normal approximation is used.”
>>>>>
>>>>> For more:
>>>>>
>>>>>
>>> https://stat.ethz.ch/R-manual/R-devel/library/stats/html/wilcox.test.html
>>>>> Hope this helps!
>>>>>
>>>>> Best,
>>>>>
>>>>> VD
>>>>>
>>>>>
>>>>> On Thu, Mar 18, 2021 at 10:36 PM Bogdan Tanasa <tanasa using gmail.com>
>>> wrote:
>>>>>> Dear Peter, thanks a lot. yes, we can see a very precise p-value, and
>>> that
>>>>>> was the request from the journal.
>>>>>>
>>>>>> if I may ask another question please : what is the meaning of
>>> "exact=TRUE"
>>>>>> or "exact=FALSE" in wilcox.test ?
>>>>>>
>>>>>> i can see that the "numerically precise" p-values are different.
>>> thanks a
>>>>>> lot !
>>>>>>
>>>>>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
>>>>>> tst$p.value
>>>>>> [1] 8.535524e-25
>>>>>>
>>>>>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=FALSE)
>>>>>> tst$p.value
>>>>>> [1] 3.448211e-25
>>>>>>
>>>>>> On Thu, Mar 18, 2021 at 10:15 PM Peter Langfelder <
>>>>>> peter.langfelder using gmail.com> wrote:
>>>>>>
>>>>>>> I thinnk the answer is much simpler. The print method for hypothesis
>>>>>>> tests (class htest) truncates the p-values. In the above example,
>>>>>>> instead of using
>>>>>>>
>>>>>>> wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
>>>>>>>
>>>>>>> and copying the output, just print the p-value:
>>>>>>>
>>>>>>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
>>>>>>> tst$p.value
>>>>>>>
>>>>>>> [1] 2.988368e-32
>>>>>>>
>>>>>>>
>>>>>>> I think this value is what the journal asks for.
>>>>>>>
>>>>>>> HTH,
>>>>>>>
>>>>>>> Peter
>>>>>>>
>>>>>>> On Thu, Mar 18, 2021 at 10:05 PM Spencer Graves
>>>>>>> <spencer.graves using effectivedefense.org> wrote:
>>>>>>>>          I would push back on that from two perspectives:
>>>>>>>>
>>>>>>>>
>>>>>>>>                1.  I would study exactly what the journal said very
>>>>>>>> carefully.  If they mandated "wilcox.test", that function has an
>>>>>>>> argument called "exact".  If that's what they are asking, then using
>>>>>>>> that argument gives the exact p-value, e.g.:
>>>>>>>>
>>>>>>>>
>>>>>>>>    > wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
>>>>>>>>
>>>>>>>>            Wilcoxon rank sum exact test
>>>>>>>>
>>>>>>>> data:  rnorm(100) and rnorm(100, 2)
>>>>>>>> W = 691, p-value < 2.2e-16
>>>>>>>>
>>>>>>>>
>>>>>>>>                2.  If that's NOT what they are asking, then I'm not
>>>>>>>> convinced what they are asking makes sense:  There is is no such
>>> thing
>>>>>>>> as an "exact p value" except to the extent that certain assumptions
>>>>>>>> hold, and all models are wrong (but some are useful), as George Box
>>>>>>>> famously said years ago.[1]  Truth only exists in mathematics, and
>>>>>>>> that's because it's a fiction to start with ;-)
>>>>>>>>
>>>>>>>>
>>>>>>>>          Hope this helps.
>>>>>>>>          Spencer Graves
>>>>>>>>
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://en.wikipedia.org/wiki/All_models_are_wrong
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2021-3-18 11:12 PM, Bogdan Tanasa wrote:
>>>>>>>>>     <
>>> https://meta.stackexchange.com/questions/362285/about-a-p-value-2-2e-16
>>>>>>>>> Dear all,
>>>>>>>>>
>>>>>>>>> i would appreciate having your advice on the following please :
>>>>>>>>>
>>>>>>>>> in R, the wilcox.test() provides "a p-value < 2.2e-16", when we
>>>>>> compare
>>>>>>>>> sets of 1000 genes expression (in the genomics field).
>>>>>>>>>
>>>>>>>>> however, the journal asks us to provide the exact p value ...
>>>>>>>>>
>>>>>>>>> would it be legitimate to write : "p-value = 0" ? thanks a lot,
>>>>>>>>>
>>>>>>>>> -- bogdan
>>>>>>>>>
>>>>>>>>>         [[alternative HTML version deleted]]
>>>>>>>>>
>>>>>>>>> ______________________________________________
>>>>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>> PLEASE do read the posting guide
>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>>> ______________________________________________
>>>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>> PLEASE do read the posting guide
>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>           [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>> --
>>>>> ----------------------------------------------------------
>>>>>
>>>>> Vivek Das, PhD
>>>>>
>>>>        [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>          [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>



More information about the R-help mailing list