[Rd] NA warnings for r<distr>() {aka "patch for random.c"}

Martin Maechler maechler at stat.math.ethz.ch
Tue Mar 11 18:07:35 CET 2008


>>>>> "BAT" == Berwin A Turlach <statba at nus.edu.sg>
>>>>>     on Tue, 11 Mar 2008 13:19:46 +0800 writes:

    BAT> G'day Martin and others,
    BAT> On Mon, 10 Mar 2008 12:06:01 +0100
    BAT> Martin Maechler <maechler at stat.math.ethz.ch> wrote:

    >> >>>>> "BAT" == Berwin A Turlach <statba at nus.edu.sg>
    >> >>>>>     on Fri, 7 Mar 2008 23:54:06 +0800 writes:
    >> 
    BAT> After all,
    >> 
    >> >> 1:2 + Inf
    BAT> [1] Inf Inf
    >> 
    BAT> does not create any warning either.  
    >> 
    >> but it doesn't give NA either.
    >> A bit more relevant (and I'm sure you meant implicitly)

    BAT> You give me more credit than I deserve. :)
    BAT> I was guided by what rexp() was doing when I chose that example, i.e.
    BAT> (potentially) warning about NAs being created when actually +/- Infs
    BAT> were created. In this thread I was already once or twice tempted to
    BAT> comment on the appropriateness of the warning message but for various
    BAT> reasonx always refrained from doing so.  

    >> BTW, I've also found that I  
    >> rnorm(n, mu=Inf) |--> NA  was not ok, and changed that to
    >> rnorm(n, mu=Inf) |--> Inf
    >> 
    >> More feedback is welcome,
    >> but please now "start" with R-devel rev >= 44715

    BAT> While I agree with this change, I think it opens a whole can of worms,

not a big can, though ....

    BAT> since it invites a rethink about all distributions.  At the moment
    BAT> we have:

    BAT> <snip>
    BAT> R version 2.7.0 Under development (unstable) (2008-03-10 r44730)
    BAT> <snip>
    >> set.seed(1) ; exp(rnorm(2))
    BAT> [1] 0.5344838 1.2015872
    >> set.seed(1) ; rlnorm(2)
    BAT> [1] 0.5344838 1.2015872

    >> set.seed(1) ; exp(rnorm(2, mean=c(-Inf,Inf)))
    BAT> [1]   0 Inf
    >> set.seed(1) ; rlnorm(2, mean=c(-Inf,Inf))
    BAT> [1] NaN NaN
    BAT> Warning message:
    BAT> In rlnorm(2, mean = c(-Inf, Inf)) : NAs produced

    BAT> The first two lines give identical results, as one could reasonably
    BAT> expect.  

Yes, but I don't think a user should *rely* on the way R
generates these random number; though I agree for the very
specific case of rlnorm(..).

    BAT> But the other two do not and I would argue that a user could
    BAT> reasonably expect that the commands in these two lines should lead to
    BAT> identical results.

They now do.

    BAT> Likewise, coming back to the one-parameter distribution used in this
    BAT> thread for illustration purposes, the *exp() functions in R are
    BAT> parameterised by the rate and an exponential random variable with rate
    BAT> r has mean (1/r) and variance (1/r^2).  Thus, I would argue that
    BAT> rexp(2, Inf) should return 0's and not NaN's, since in the limit
    BAT> a rate of Inf seems to refer to a variable with mean 0 and variance 0,
    BAT> i.e. the constant 0.  And it would also be "more consistent" with the
    BAT> behaviour of rexp() when the rate parameter is "large but finite".

yes.  And that's *the* (only) case where 'Inf' parameters or 'scale = 0'
situations should work: when it is a natural limit behavior.

    BAT> Then one can argue about rgeom() when p=0.  But I guess in that case
    BAT> the limit is a "random" variable with "infinite mean" and "infinite
    BAT> variance" and hence it is o.k. to return NaNs and not Infs.  After all,
    BAT> your comments in rnorm.c seem to indicate that you think that reporting
    BAT> +/- Inf back is only o.k. if the mean is +/- Inf but the variance is
    BAT> finite.

    BAT> I did not look at (or think about) extreme cases for any other
    BAT> distributions, but further discussion/feedback would indeed be helpful
    BAT> it seems. :)

I looked at a few more, basically all  continuous
src/nmath/r<foo>.c  ones.

    BAT> And the attached patch would address the behaviour of rlnorm() and
    BAT> rexp() that I have raised above.  With these changes, on my machine,
    BAT> "make check FORCE=FORCE" succeeds.

I don't think your change to  .../R/distn.R  was good,
but the others I have more or less committed together with a few
more similar ones.

    BAT> BTW, I was surprised to realise that the *exp() functions in the
    BAT> underlying C code use the opposite parameterisation from the
    BAT> corresponding functions at R level.  Perhaps it would be worthwhile to
    BAT> point this out in section 6.7.1 of the Writing R extension manual?  In
    BAT> particular since the manual states:

    BAT> Note that these argument sequences are (apart from the names and that
    BAT> @code{rnorm} has no @var{n}) exactly the same as the corresponding
    BAT> @R{} functions of the same name, so the documentation of the @R{}
    BAT> functions can be used.

    BAT> Well, as I noticed the hard way, for *exp() the documentation of the
    BAT> corresponding R functions cannot be used. ;-)

We often also gratefully accept patches for the documentation 
...

    BAT> Thanks for you patience.
;-)

Regards,
Martin



More information about the R-devel mailing list