[Rd] inconsistency in pchisq (PR#7099)

maechler at stat.math.ethz.ch maechler at stat.math.ethz.ch
Sat Jul 17 13:21:45 CEST 2004


>>>>> "Richard" == Richard Mott <Richard.Mott at well.ox.ac.uk>
>>>>>     on Sat, 17 Jul 2004 11:12:23 +0100 (BST) writes:

    Richard> Martin - I agree that the p-values are essentially
    Richard> identical. i raised the issue becuase i can get
    Richard> negative p-values with the ncp=0 version. It is
    Richard> hard to reproduce this problem in the sense that
    Richard> the value of the chi-squared statistic that causes
    Richard> the phenomenom is identical to the one i sent you
    Richard> to 5 sig figures - i don't know how to print the
    Richard> full value in order to send it to you. Negative
    Richard> small probabilities could of course be treated as
    Richard> 0, but this creates problems when i take logs

sure, and much more of a problem, i.e., a clear bug.
--> so after all your bug report was well valid [-> CC'ed back to R-bugs]

To see it more extremely, try the following :

 > curve(pchisq(x, df=1 ,       lower=FALSE), 65, 70, ylim=c(-1,4)*4e-16, col=2)
 > curve(pchisq(x, df=1, ncp=0, lower=FALSE), 65, 70, add=TRUE)

The reason for this behavior is simply that internal pnchisq(*, lower=FALSE)
at the moment simply is equivalent to  1 - pnchisq(*, lower=TRUE), 
and since the computer epsilon is 2e-16 it's no wonder that
cancellation swamps everything for these extreme abscissa values.

Note that we could easily change the case  'ncp=0' to use the
central chisq,  but that's not the case for e.g. ncp = 0.001.

==> pchisq(*, ncp) definitely needs to be improved.

I have code to do it (using "Wiener germ approximations"), but not
finished testing it yet.

    Richard> If you like i can send you the data and program
    Richard> used to generate the problem - the chi-squared
    Richard> value is generated from a call to glm()

    Richard> Richard

    >>>>>>> "Richard" == Richard Mott <rmott at well.ox.ac.uk> on
    >>>>>>> Sat, 17 Jul 2004 01:42:18 +0100 writes:
    >>
    Richard> Martin - thanks - so is it always better to use
    Richard> pchisq(67.60644,df=1,lower.tail=F) if indeed ncp=0
    Richard> ?
    >>  Yes, (for the algorithms currently in use; and as I
    >> said, these are destined to be improved).
    >> 
    >> However I do wonder: In which situations would it matter
    >> to have P = 2e-16 vs P = 3e-15 ??
    >> 
    >> Very often everything depends on underlying model
    >> assumptions, and such extreme tail probabilities are
    >> typically extremely dependent, i.e. would vary heavily by
    >> small changes in the underlying model.
    >> 
    >> Regards, Martin



More information about the R-devel mailing list