[Rd] [R] RNG Cycle and Duplication (PR#12540)

Shengqiao Li shli at stat.wvu.edu
Sun Aug 17 00:09:58 CEST 2008


Knuth's double version RNG rng-double.c dose a great job. No ties were 
observed for 10M numbers ( totally 2^52 possible different values?). 
In rng-double, double modulo mod_sum replaced the integer version mod_diff 
in the integer version rng.c that is adopted by R.

The integer version uses modulus 2^30. Therefore there are only 2^30 
distinct numbers, which is confirmed by my previous test in R.

If someday Knuth's double version is also included in R, it will be great.


Shengqiao Li


On Fri, 15 Aug 2008, Duncan Murdoch wrote:

> On 15/08/2008 10:28 AM, Shengqiao Li wrote:
>> Thank you for your reply and for your suggestion. So the note in man page 
>> could be more accurate since for an end user, man page should be more 
>> helpful and source code is mainly for developers.
>> 
>> I was also adviced to use Knuth's  double version RANARRAY from
>> http://www-cs-faculty.stanford.edu/~knuth/programs.html instead of the 
>> integer versions in R. I'm a R user. So why not also include the double 
>> verion in R implementation?
>
> You can try it using kind="user-supplied" if you like, but I suspect it's the 
> same as "Knuth-TAOCP-2002".
>
> Duncan Murdoch
>
>> 
>> Thanks again,
>> 
>> ========================================
>> Shengqiao Li
>> 
>> Research Associate
>> The Department of Statistics
>> PO Box 6330
>> West Virginia University
>> Morgantown, WV 26506-6330
>> 
>> ========================================
>> 
>> On Fri, 15 Aug 2008, Duncan Murdoch wrote:
>> 
>>> shli at stat.wvu.edu wrote:
>>>>   This message is in MIME format.  The first part should be readable 
>>>> text,
>>>>   while the remaining parts are likely unreadable without MIME-aware 
>>>> tools.
>>>> 
>>>> ---559023410-851401618-1218751024=:15885
>>>> Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed
>>>> Content-Transfer-Encoding: QUOTED-PRINTABLE
>>>> 
>>>> 
>>>> I didn't describe the problem clearly. It's about the number of 
>>>> distinct=20
>>>> values. So just ignore cycle issue.
>>>> 
>>>> My tests were:
>>>> 
>>>> RNGkind(kind=3D"Knuth-TAOCP");
>>>> sum(duplicated(runif(1e7))); #return 46552
>>>> 
>>>> RNGkind(kind=3D"Knuth-TAOCP-2002");
>>>> sum(duplicated(runif(1e7))); #return 46415
>>>> 
>>>> #These collision frequency suggested there were 2^30 distinct values 
>>>> by=20
>>>> birthday problem.
>>>> 
>>> The birthday problem distribution applies to independent draws, but they 
>>> are only pseudo-independent.  I think the only ways to know for sure if 
>>> there are 2^30 values are to look at the code, or run through a complete 
>>> cycle.  And you need to determine the cycle by looking at .Random.seed, 
>>> not at the returned value.
>>>> RNGkind(kind=3D"Marsaglia-Multicarry");
>>>> sum(duplicated(runif(1e7))); #return 11682
>>>> 
>>>> RNGkind(kind=3D"Super-Duper");
>>>> sum(duplicated(runif(1e7))); #return 11542
>>>> 
>>>> RNGkind(kind=3D"Mersenne-Twister");
>>>> sum(duplicated(runif(1e7))); #return 11656
>>>> 
>>>> #These indicated there were 2^32 distinct values, which agrees with 
>>>> the=20
>>>> help info.
>>>> 
>>>> 
>>> If there are 2^30 distinct values for the two generators above, that also 
>>> agrees with the documentation.
>>> 
>>>> RNGkind(kind=3D"Wichmann-Hill");
>>>> sum(duplicated(runif(1e7))); #return 0
>>>> 
>>>> #So for this method, there should be more than 2^32 distinct values.
>>>> 
>>>> You may not get the exact numbers, but they should be close. So how to=20
>>>> explain above problem?
>>>> 
>>> You haven't demonstrated what you claim, but if you look at the source, 
>>> you'll see that in fact the man page is wrong:  Wichmann-Hill is based on 
>>> 3 integer values, which each take on approximately 15 bits of different 
>>> values. So Wichmann-Hill could take nearly 2^45 different values (actually 
>>> 30269*30307*30323).
>>> 
>>> The source is in https://svn.r-project.org/R/trunk/src/main/RNG.c if you 
>>> want to check the others.
>>>> I need generate a large sample without any ties, it seems to me=20
>>>> "Wichmann-Hill" is only choice right now.
>>>> 
>>> An alternative would be to construct a new value from two (or more) 
>>> runif() values, but be careful that you don't mess up the distribution 
>>> when you do that.
>>> 
>>> Duncan Murdoch
>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>>> Shengqiao Li
>>>> 
>>>> The Department of Statistics
>>>> PO Box 6330
>>>> West Virginia University
>>>> Morgantown, WV 26506-6330
>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>>> 
>>>> On Thu, 14 Aug 2008, Peter Dalgaard wrote:
>>>> 
>>>> 
>>>>> Shengqiao Li wrote:
>>>>> 
>>>>>> Hello all,
>>>>>> =20
>>>>>> I am generating large samples of random numbers. The RNG help page 
>>>>>> says:=
>>>>>> 
>>>> =20
>>>> 
>>>>>> "All the supplied uniform generators return 32-bit integer values that 
>>>>>> a=
>>>>>> 
>>>> re=20
>>>> 
>>>>>> converted to doubles, so they take at most 2^32 distinct values and 
>>>>>> long=
>>>>>> 
>>>> =20
>>>> 
>>>>>> runs will return duplicated values." But I find that the cycles are not 
>>>>>> =
>>>>>> 
>>>> the=20
>>>> 
>>>>>> same as the 32-bit integer.
>>>>>> =20
>>>>>> My test indicated that the cycles for Knuth's methods were 2^30 
>>>>>> while=20
>>>>>> Wichmann-Hill's cycle was larger than 2^32! No numbers were duplicated 
>>>>>> i=
>>>>>> 
>>>> n=20
>>>> 
>>>>>> 10M numbers generated by runif using Wichmann-Hill. The other three 
>>>>>> meth=
>>>>>> 
>>>> ods=20
>>>> 
>>>>>> had cycle length of 2^32.
>>>>>> =20
>>>>>> So, anybody can explain this? And any improvement to the implementation 
>>>>>> =
>>>>>> 
>>>> can=20
>>>> 
>>>>>> be made to increase the cycle length like the Wichmann-Hill method?
>>>>>> =20
>>>>>> 
>>>>> What test? These are not simple linear congruential generators. Just 
>>>>> beca=
>>>>> 
>>>> use=20
>>>> 
>>>>> you get the same value twice, it doesn't mean that the sequence is 
>>>>> repeat=
>>>>> 
>>>> ing.=20
>>>> 
>>>>> Perhaps you should read the entire help page rather than just the note.
>>>>> 
>>>>> --=20
>>>>>  O__  ---- Peter Dalgaard             =D8ster Farimagsgade 5, Entr.B
>>>>> c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
>>>>> (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
>>>>> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907
>>>>> 
>>>>> 
>>>>> 
>>>> ---559023410-851401618-1218751024=:15885--
>>>> 
>>>> ______________________________________________
>>>> R-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>> 
>>> 
>> 
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>



More information about the R-devel mailing list