[Rd] NA warnings for r<distr>() {aka "patch for random.c"}

Martin Maechler maechler at stat.math.ethz.ch
Fri Mar 7 15:01:26 CET 2008


Hi Berwin (and "listeners"),

>>>>> "BAT" == Berwin A Turlach <statba at nus.edu.sg>
>>>>>     on Wed, 5 Mar 2008 20:26:24 +0800 writes:

    BAT> G'day Martin, On Mon, 3 Mar 2008 10:16:45 +0100 Martin
    BAT> Maechler <maechler at stat.math.ethz.ch> wrote:

    >> >>>>> "BAT" == Berwin A Turlach <statba at nus.edu.sg> >>>>>
    >> on Fri, 29 Feb 2008 18:19:40 +0800 writes:
    >> 
    BAT> while looking for some inspiration of how to organise
    BAT> some code, I studied the code of random.c and noticed
    BAT> that for distributions with 2 or 3 parameters the user
    BAT> is not warned if NAs are created while such a warning
    BAT> is issued for distributions with 1 parameter. [...] The
    BAT> attached patch rectifies this.  [...]
    >> 
    >> I cannot imagine a design reason for that.  If there was,
    >> it should have been mentioned as a comment in the C code.
    >> 
    >> I'll commit your patch (if it passes the checks).

    BAT> Sorry, I was a bit in a hurry when writing the e-mail,
    BAT> so I forgot to mention that the source code modified by
    BAT> this patch compiles and passes "make check FORCE=FORCE"
    BAT> on my machine.

ok.

    BAT> And in my hurry, I also posted from my NUS account,
    BAT> without realising it, which forced you to intervene as
    BAT> moderator and to approve the posting.  My apologies for
    BAT> the extra work.  But this gave me the idea to also
    BAT> subscribe to r-devel with my NUS account and configure
    BAT> the subscriptions so that I only receive e-mail at my
    BAT> UWA account.  Thus, hopefully, you will not have to
    BAT> intervene again.  (Which this e-mail should test.)

(and it succeeded)
 
    BAT> BTW, there are other places in the code were NAs can be
    BAT> created but no warning is issued.  E.g:
    >> 
    >> >> rexp(2, rate=numeric())
    BAT> [1] NA NA
    >> >> rnorm(2, mean=numeric())
    BAT> [1] NA NA
    >> 
    BAT> I wonder whether a warning should be issued in this
    BAT> case too.
    >> 
    >> Yes, "should in principle".
    >> 
    >> If you feel like finding another elegant patch...

    BAT> Well, elegance is in the eye of the beholder. :-) 

    BAT> I attach two patches.  One that adds warning messages at
    BAT> the other places where NAs can be generated.

ok. The result is very simple ``hence elegant''.

But actually, part of the changed behavior may be considered
undesirable:

  rnorm(2, mean = NA)

which gives two NaN's  would now produce a warning,
where I could argue that 
   'arithmetic with NAs should give NAs without a warning'
since
  1:2 + NA
also gives NAs without a warning.

So we could argue that a warning should *only* be produced in a
case where the parameters of the distribution are not NA.

What do others (particularly R-core !) think?

    BAT> The second one in additiona rearranges the code a bit
    BAT> such that in the case when all the "vectors" that
    BAT> contain the parameter values of the distribution, from
    BAT> which one wants to simulate, are of length one some
    BAT> unnecessary calculations is taken out of the for loop.

    BAT> I am not sure how much time is actually saved in this
    BAT> situation, but I belong to the school that things such
    BAT> kind of optimisation should be done. :) 

I understand, but I think most of R-core are "from the school" 
that teaches
     "if you want to optimize, first *measure*"

    BAT> If you think it bloats the code too much (or duplicates
    BAT> things too much leading to hard to maintain code), then
    BAT> feel free to ignore this second patch.

For now, I will ignore this second patch.

 - it does bloat the code slightly  (as you conceded)
 - it uses an if(<Simple>) test which makes the code slightly
   *slower* for the (probably rare) case when the <Simple> is false.

but most importantly:

 - we have no idea if the speedup (when <Simple> is TRUE)
   is in the order of  10%, 1% or 0.1%
 
   My guess would be '0.1%' for rnorm(), and that would
   definitely not warrant the extra check.

    >> Thank you Berwin, for your contribution!

and thanks again!
Martin

    BAT> My pleasure.

    BAT> Cheers,
    BAT> 	Berwin



More information about the R-devel mailing list