[Rd] Bias in R's random integers?

Paul Gilbert pgilbert902 @ending from gm@il@com
Thu Sep 20 18:30:31 CEST 2018


On 09/19/2018 10:03 AM, Ben Bolker wrote:
...
>     Balancing backward compatibility and correctness is a tough problem
> here.  

I think improvements in the RNG is a situation where backward 
compatibility is not really going to be lost, because people can specify 
the old generator, they just will not get it by default. My opinion is 
that the default needs to generally be the best option available because 
too many people will be expecting that, or not know better, in which 
case that is what they should get.

There are only two small problems that occur to me:

1/ Researchers that want to have reproducible results (all I hope) need 
to be aware the change has happened. In theory they should have recorded 
the RNG they were using, along with the seed (and, BTW, the number of 
nodes if they generate with a parallel generator). If they have not done 
that then they can figure out the RNG from knowing what version of R 
they used. If they haven't recorded that then they can figure it out by 
some experimentation and knowing roughly when they did the research. If 
none of this works then the research probably should be lost.

As an exercise, researchers might also want to experiment with whether 
the new default qualitatively changes their results. That might lead to 
publishable research, so no one should complain.

2/ Package maintainers that have used the default RNG to generate tests 
may need to change their tests to specify the old generator, or modify 
results used for comparisons in the tests. Since package testing is 
usually for code checking rather than statistical results, not using the 
best available generator is not usually an issue.

Most of my own package testing already specifies the generator, lots 
uses "buggy Kinderman-Ramage" because tests were set up a long time ago. 
I will have to change package setRNG which warns when the default 
generator changes. (This warning is intentional because I was bitten 
badly by a small change in the S generator circa 1990.)


> If this goes into base R, what's the best way to do it?  What was
> the protocol for migrating away from the "buggy Kinderman-Ramage"
> generator, back in the day?   (Version 1.7 was sometime between 2001 and
> 2004).

I think there may have been a change in R 0.99 too. At least my notes 
suggest that the code I changed for  R 1.7.0 had worked with the default 
generator from R 0.99 to 1.6.2.

I don't recall the protocol, I think it just happened and was announced 
in the NEWS. (Has this protocol changed?) The ramification for me was 
that I had to go through all of my packages' testing and change the name 
of the explicitly specified RNG to "buggy Kinderman-Ramage".

Perhaps there does need to be a protocol for testing before release. 
When my package setRNG fails then many of my other packages will also 
fail because they depend on it. This is a simple fix but reverse 
dependencies may make it look like lots of things are broken.

Paul Gilbert

>    I couldn't find the exact commit in the GitHub mirror: this is related ...
> 
> https://github.com/wch/r-source/commit/7ad3044639fd1fe093c655e573fd1a67aa7f55f6#diff-dbcad570d4fb9b7005550ff630543b37
> 
> 
> 
> ===
> ‘normal.kind’ can be ‘"Kinderman-Ramage"’, ‘"Buggy
>       Kinderman-Ramage"’ (not for ‘set.seed’), ‘"Ahrens-Dieter"’,
>       ‘"Box-Muller"’, ‘"Inversion"’ (the default), or ‘"user-supplied"’.
>       (For inversion, see the reference in ‘qnorm’.)  The
>       Kinderman-Ramage generator used in versions prior to 1.7.0 (now
>       called ‘"Buggy"’) had several approximation errors and should only
>       be used for reproduction of old results.
> 
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



More information about the R-devel mailing list