[Rd] A bug in the R Mersenne Twister (RNG) code?

Gabriel Becker gmbecker at ucdavis.edu
Thu Sep 1 17:34:31 CEST 2016


I wonder how useful a (set of?) "time machine" functions which look up
/infer things like this based on a date would be. Could ease the pain of
changes generally, though not remove it completely.

~G

On Wed, Aug 31, 2016 at 5:45 PM, Paul Gilbert <pgilbert902 at gmail.com> wrote:

>
>
> On 08/30/2016 06:29 PM, Duncan Murdoch wrote:
>
>> I don't see evidence of a bug.  There have been several versions of the
>> MT; we may be using a different version than you are.  Ours is the
>> 1999/10/28 version; the web page you cite uses one from 2002.
>>
>> Perhaps the newer version fixes some problems, and then it would be
>> worth considering a change.  But changing the default RNG definitely
>> introduces problems in reproducibility,
>>
>
> Well "problems in reproducibility" is a bit vague. Results would always be
> reproducible by specifying kind="Mersenne-Twister" or kind="Buggy
> Kinderman-Ramage" for older results, so there is no problem reproducing
> results. The only problem is that users expecting to reproduce results
> twenty years later will need to know what random generator they used. (BTW,
> they may also need to record information about the normal or other
> generator, as well as the seed.) Of course, these changes are recorded
> pretty well for R, so the history of "default" can always be found.
>
> I think it is a mistake to encourage users into thinking they do not need
> to keep track of some information if they want reproducibility. Perhaps the
> default should be changed more often in order to encourage better user
> habits.
>
> More seriously, I think "default" should continue to be something that is
> currently considered to be good. So, if there really is a known problem,
> then I think "default" should be changed.
>
> (And, no I did not get burned by the R 1.7.0 change in the default
> generator. I got burned by a much earlier, unadvertised, and more subtle
> change in the Splus generator.)
>
> Paul Gilbert
>
>
> so it's not obvious that we
>
>> would do it.
>>
>> Duncan Murdoch
>>
>>
>> On 30/08/2016 5:45 PM, Mark Roberts wrote:
>>
>>> Whomever,
>>>
>>> I recently sent the "bug report" below toR-core at r-project.org and have
>>> just been asked to instead submit it to you.
>>>
>>> Although I am basically not an R user, I have installed version 3.3.1
>>> and am also the author of a statistics program written in Visual Basic
>>> that contains a component which correctly implements the Mersenne
>>> Twister (MT) algorithm.  I believe that it is not possible to generate
>>> the correct stream of pseudorandom numbers using the MT default random
>>> number generator in R, and am not the first person to notice this.  Here
>>> is a posted 2013 entry
>>> (www.r-bloggers.com/reproducibility-and-randomness/) on an R website
>>> that asserts that the SAS computer program implementation of the MT
>>> algorithm produces different numbers than R does when using the same
>>> starting seed number.  The author of this post didn’t get anyone to
>>> respond to his query about the reason for this SAS vs. R discrepancy.
>>>
>>> There are two ways of initializing the original MT computer program
>>> (written in C) so that an identical stream of numbers can be repeatedly
>>> generated:  1) with a particular integer seed number, and 2) with a
>>> particular array of integers.   In the 'compilation and usage' section
>>> of this webpage (https://github.com/cslarsen/mersenne-twister) there is
>>> a listing of the first 200 random numbers the MT algorithm should
>>> produce for seed number = 1.  The inventors of the Mersenne Twister
>>> random number generator provided two different sets of the first 1000
>>> numbers produced by a correctly coded 32-bit implementation of the MT
>>> algorithm when initializing it with a particular array of integers at:
>>> www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/CODES/mt19937ar.out.
>>> [There is a link to this output at:
>>> www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html.]
>>>
>>> My statistics program obtains exactly those 200 numbers from the first
>>> site mentioned in the previous paragraph and also obtains those same
>>> numbers from the second website (though I didn't check all 2000 values).
>>>    Assuming that the MT code within R uses the 32-bit MT algorithm, I
>>> suspect that the current version of R can't do that.  If you (i.e.,
>>> anyone who might knowledgeably respond to this report) is able to
>>> duplicate those reference test-values, then please send me the R code to
>>> initialize the MT code within R to successfully do that, and I apologize
>>> for having wasted your time. If you (collectively) can't do that, then R
>>> is very likely using incorrectly implemented MT code.  And if this
>>> latter possibility is true, it seems to me that this is something that
>>> should be fixed.
>>>
>>> Mark Roberts, Ph.D.
>>>
>>>     [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 
Gabriel Becker, PhD
Associate Scientist (Bioinformatics)
Genentech Research

	[[alternative HTML version deleted]]



More information about the R-devel mailing list