[Rd] results of pnorm as either NaN or Inf

Fri May 14 17:38:28 CEST 2010

Thank you for your responses. 

I presented a minimal example of the issue, but I should have explained
that this came up in the context of maximizing a log likelihood function
(with optim). I certainly agree that there would be no good reason for a
human to evaluate the function pnorm(-x, log.p=TRUE) for large x. However,
I would suggest that one may want the function to return a value of -Inf
for large x, rather than issue an error or a warning, as I'm guessing your
alteration of the C code would do. Returning a value of -Inf would allow an
optimization routine (or a pre-optimization-routine grid search) to know
that it's going into (or is currently in) the wrong region. I can
definitely see the counterargument that someone using an optimization
routine should think about the function they are optimizing more carefully,
especially given that there could be numerical issues. I think I'm not the
right person to judge, both because I don't know the programming issues nor
much about numerical optimization, but I wanted to offer a possible
argument for returning -Inf in case it hasn't already been considered, and
others can evaluate the issue. (In general, my earlier purpose was just to
point out what I thought might be an inconsistency in case people wanted to
change it. I'm not concerned about it for my own usage at all, since I can
just adjust my own programs.)

Thank you, and to all, thank you for all of your work on R.

Eric

On Fri, 14 May 2010 11:50:09 +0100 (BST), Prof Brian Ripley
<ripley at stats.ox.ac.uk> wrote:
> The answer to pnorm(-x, log.p=TRUE) is of course about -0.5*x*x for 
> large x.  So trying to compute it for -1e108 is a bit silly, both on 
> your part and the C code used.  I've altered the C code not to attempt 
> it for x > 1e170, which excludes the area where underflow occurs.
> 
> On Fri, 14 May 2010, Ted.Harding at manchester.ac.uk wrote:
> 
>> On 13-May-10 20:04:50, efreeman at berkeley.edu wrote:
>>> I stumbled across this and I am wondering if this is unexpected
>>> behavior or if I am missing something.
>>>
>>>> pnorm(-1.0e+307, log.p=TRUE)
>>> [1] -Inf
>>>> pnorm(-1.0e+308, log.p=TRUE)
>>> [1] NaN
>>> Warning message:
>>> In pnorm(q, mean, sd, lower.tail, log.p) : NaNs produced
>>>> pnorm(-1.0e+309, log.p=TRUE)
>>> [1] -Inf
>>>
>>> I don't know C and am not that skilled with R, so it would be hard
>>> for me to look into the code for pnorm. If I'm not just missing
>>> something, I thought it may be of interest.
>>>
>>> Details:
>>> I am using Mac OS X 10.5.8. I installed a precompiled binary version.
>>> Here is the output from sessionInfo(), requested in the posting guide:
>>> R version 2.11.0 (2010-04-22)
>>> i386-apple-darwin9.8.0
>>>
>>> locale:
>>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> loaded via a namespace (and not attached):
>>> [1] tools_2.11.0
>>>
>>> Thank you very much,
>>>
>>> Eric Freeman
>>> UC Berkeley
>>
>> This is probably platform-independent. I get the same results with
>> R on Linux. More to the point:
>>
>> You are clearly "pushing the envelope" here. First, have a look
>> at what R makes of your inputs to pnorm():
>>
>>  -1.0e+307
>>  # [1] -1e+307
>>  -1.0e+308
>>  # [1] -1e+308
>>  -1.0e+309
>>  # [1] -Inf
>>
>>
>> So, somewhere beteen -1e+308 and -1.0e+309. the envelope burst!
>> Given -1.0e+309, R returns -Inf (i.e. R can no longer represent
>> this internally as a finite number).
>>
>> Now look at
>>
>>  pnorm(-Inf,log.p=TRUE)
>>  # [1] -Inf
>>
>> So, R knows how to give the correct answer (an exact 0, or -Inf
>> on the log scale) if you feed pnorm() with -Inf. So you're OK
>> with -1e+N where N >= 309.
>>
>> For smaller powers, e.g. -1e+(200:306), these give pnorm() much
>> less than -1.0e+309, and presumably R's algorithm (which I haven't
>> studied either ... ) returns 0 for pnorm(), as it should to the
>> available internal accuracy.
>>
>> So, up to pnorm(-1.0e+307, log.p=TRUE) = -Inf. All is as it should be.
>> However, at -1e+308, "the envelope is about to burst", and something
>> may occur within the algorithm which results in a NaN.
>>
>> So there is nothing anomalous about your results except at -1e+308,
>> which is where R is at a critical point.
>>
>> That's how I see it, anway!
>> Ted.
>>
>> --------------------------------------------------------------------
>> E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
>> Fax-to-email: +44 (0)870 094 0861
>> Date: 14-May-10                                       Time: 00:01:27
>> ------------------------------ XFMail ------------------------------
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>