[Rd] [R] RNG Cycle and Duplication (PR#12540)

Duncan Murdoch murdoch at stats.uwo.ca
Sun Aug 17 00:30:13 CEST 2008


On 16/08/2008 6:09 PM, Shengqiao Li wrote:
> Knuth's double version RNG rng-double.c dose a great job. No ties were 
> observed for 10M numbers ( totally 2^52 possible different values?). 
> In rng-double, double modulo mod_sum replaced the integer version mod_diff 
> in the integer version rng.c that is adopted by R.
> 
> The integer version uses modulus 2^30. Therefore there are only 2^30 
> distinct numbers, which is confirmed by my previous test in R.
> 
> If someday Knuth's double version is also included in R, it will be great.

I don't see what the problem is -- why not just do it?  That's what 
"user-supplied" is for.

Duncan Murdoch

> 
> 
> Shengqiao Li
> 
> 
> On Fri, 15 Aug 2008, Duncan Murdoch wrote:
> 
>> On 15/08/2008 10:28 AM, Shengqiao Li wrote:
>>> Thank you for your reply and for your suggestion. So the note in man page 
>>> could be more accurate since for an end user, man page should be more 
>>> helpful and source code is mainly for developers.
>>>
>>> I was also adviced to use Knuth's  double version RANARRAY from
>>> http://www-cs-faculty.stanford.edu/~knuth/programs.html instead of the 
>>> integer versions in R. I'm a R user. So why not also include the double 
>>> verion in R implementation?
>> You can try it using kind="user-supplied" if you like, but I suspect it's the 
>> same as "Knuth-TAOCP-2002".
>>
>> Duncan Murdoch
>>
>>> Thanks again,
>>>
>>> ========================================
>>> Shengqiao Li
>>>
>>> Research Associate
>>> The Department of Statistics
>>> PO Box 6330
>>> West Virginia University
>>> Morgantown, WV 26506-6330
>>>
>>> ========================================
>>>
>>> On Fri, 15 Aug 2008, Duncan Murdoch wrote:
>>>
>>>> shli at stat.wvu.edu wrote:
>>>>>   This message is in MIME format.  The first part should be readable 
>>>>> text,
>>>>>   while the remaining parts are likely unreadable without MIME-aware 
>>>>> tools.
>>>>>
>>>>> ---559023410-851401618-1218751024=:15885
>>>>> Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed
>>>>> Content-Transfer-Encoding: QUOTED-PRINTABLE
>>>>>
>>>>>
>>>>> I didn't describe the problem clearly. It's about the number of 
>>>>> distinct=20
>>>>> values. So just ignore cycle issue.
>>>>>
>>>>> My tests were:
>>>>>
>>>>> RNGkind(kind=3D"Knuth-TAOCP");
>>>>> sum(duplicated(runif(1e7))); #return 46552
>>>>>
>>>>> RNGkind(kind=3D"Knuth-TAOCP-2002");
>>>>> sum(duplicated(runif(1e7))); #return 46415
>>>>>
>>>>> #These collision frequency suggested there were 2^30 distinct values 
>>>>> by=20
>>>>> birthday problem.
>>>>>
>>>> The birthday problem distribution applies to independent draws, but they 
>>>> are only pseudo-independent.  I think the only ways to know for sure if 
>>>> there are 2^30 values are to look at the code, or run through a complete 
>>>> cycle.  And you need to determine the cycle by looking at .Random.seed, 
>>>> not at the returned value.
>>>>> RNGkind(kind=3D"Marsaglia-Multicarry");
>>>>> sum(duplicated(runif(1e7))); #return 11682
>>>>>
>>>>> RNGkind(kind=3D"Super-Duper");
>>>>> sum(duplicated(runif(1e7))); #return 11542
>>>>>
>>>>> RNGkind(kind=3D"Mersenne-Twister");
>>>>> sum(duplicated(runif(1e7))); #return 11656
>>>>>
>>>>> #These indicated there were 2^32 distinct values, which agrees with 
>>>>> the=20
>>>>> help info.
>>>>>
>>>>>
>>>> If there are 2^30 distinct values for the two generators above, that also 
>>>> agrees with the documentation.
>>>>
>>>>> RNGkind(kind=3D"Wichmann-Hill");
>>>>> sum(duplicated(runif(1e7))); #return 0
>>>>>
>>>>> #So for this method, there should be more than 2^32 distinct values.
>>>>>
>>>>> You may not get the exact numbers, but they should be close. So how to=20
>>>>> explain above problem?
>>>>>
>>>> You haven't demonstrated what you claim, but if you look at the source, 
>>>> you'll see that in fact the man page is wrong:  Wichmann-Hill is based on 
>>>> 3 integer values, which each take on approximately 15 bits of different 
>>>> values. So Wichmann-Hill could take nearly 2^45 different values (actually 
>>>> 30269*30307*30323).
>>>>
>>>> The source is in https://svn.r-project.org/R/trunk/src/main/RNG.c if you 
>>>> want to check the others.
>>>>> I need generate a large sample without any ties, it seems to me=20
>>>>> "Wichmann-Hill" is only choice right now.
>>>>>
>>>> An alternative would be to construct a new value from two (or more) 
>>>> runif() values, but be careful that you don't mess up the distribution 
>>>> when you do that.
>>>>
>>>> Duncan Murdoch
>>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
>>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>>>> Shengqiao Li
>>>>>
>>>>> The Department of Statistics
>>>>> PO Box 6330
>>>>> West Virginia University
>>>>> Morgantown, WV 26506-6330
>>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
>>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>>>>
>>>>> On Thu, 14 Aug 2008, Peter Dalgaard wrote:
>>>>>
>>>>>
>>>>>> Shengqiao Li wrote:
>>>>>>
>>>>>>> Hello all,
>>>>>>> =20
>>>>>>> I am generating large samples of random numbers. The RNG help page 
>>>>>>> says:=
>>>>>>>
>>>>> =20
>>>>>
>>>>>>> "All the supplied uniform generators return 32-bit integer values that 
>>>>>>> a=
>>>>>>>
>>>>> re=20
>>>>>
>>>>>>> converted to doubles, so they take at most 2^32 distinct values and 
>>>>>>> long=
>>>>>>>
>>>>> =20
>>>>>
>>>>>>> runs will return duplicated values." But I find that the cycles are not 
>>>>>>> =
>>>>>>>
>>>>> the=20
>>>>>
>>>>>>> same as the 32-bit integer.
>>>>>>> =20
>>>>>>> My test indicated that the cycles for Knuth's methods were 2^30 
>>>>>>> while=20
>>>>>>> Wichmann-Hill's cycle was larger than 2^32! No numbers were duplicated 
>>>>>>> i=
>>>>>>>
>>>>> n=20
>>>>>
>>>>>>> 10M numbers generated by runif using Wichmann-Hill. The other three 
>>>>>>> meth=
>>>>>>>
>>>>> ods=20
>>>>>
>>>>>>> had cycle length of 2^32.
>>>>>>> =20
>>>>>>> So, anybody can explain this? And any improvement to the implementation 
>>>>>>> =
>>>>>>>
>>>>> can=20
>>>>>
>>>>>>> be made to increase the cycle length like the Wichmann-Hill method?
>>>>>>> =20
>>>>>>>
>>>>>> What test? These are not simple linear congruential generators. Just 
>>>>>> beca=
>>>>>>
>>>>> use=20
>>>>>
>>>>>> you get the same value twice, it doesn't mean that the sequence is 
>>>>>> repeat=
>>>>>>
>>>>> ing.=20
>>>>>
>>>>>> Perhaps you should read the entire help page rather than just the note.
>>>>>>
>>>>>> --=20
>>>>>>  O__  ---- Peter Dalgaard             =D8ster Farimagsgade 5, Entr.B
>>>>>> c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
>>>>>> (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
>>>>>> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907
>>>>>>
>>>>>>
>>>>>>
>>>>> ---559023410-851401618-1218751024=:15885--
>>>>>
>>>>> ______________________________________________
>>>>> R-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>



More information about the R-devel mailing list