[Rd] [R] RNG Cycle and Duplication (PR#12540)

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri Aug 15 17:40:39 CEST 2008


On Fri, 15 Aug 2008, Shengqiao Li wrote:

>
> Professor Ripley,
>
> Thank you for your solution.  So the last paragraph of the Note in RNG help 
> page will be updated since Wichmann-Hill is different from other supplied 
> uniform generators in the number of distinct values?

Please read and follow the R posting guide, and check the current version 
*before* posting as you were asked to do.

> Shengqiao Li
>
>
> On Fri, 15 Aug 2008, Prof Brian Ripley wrote:
>
>> Remember Wichmann-Hill is a composite generator.  Its composition does take 
>> more than 2^32 distinct values.
>> 
>> You still haven't identifed a problem here.  The note is to warn that 
>> runif() does repeat within a cycle, because people wrote code assuming 
>> otherwise.  It would be a poor use of runif() to rely on the low-order 
>> bits, and that's standard advice in the field.
>> 
>> For a large sample of uniforms use something like the normal inversion 
>> does, e.g. 2^(-30) * (runif(N, 0, 2^30) %% 2^30 + runif(N))
>> 
>> Please do leave R-bugs out of this: we already have 4 entries as a result 
>> of your misunderstandings and false claims.
>> 
>> On Thu, 14 Aug 2008, shli at stat.wvu.edu wrote:
>>
>>>  This message is in MIME format.  The first part should be readable text,
>>>  while the remaining parts are likely unreadable without MIME-aware tools.
>>> 
>>> ---559023410-851401618-1218751024=:15885
>>> Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed
>>> Content-Transfer-Encoding: QUOTED-PRINTABLE
>>> 
>>> 
>>> I didn't describe the problem clearly. It's about the number of 
>>> distinct=20
>>> values. So just ignore cycle issue.
>>> 
>>> My tests were:
>>> 
>>> RNGkind(kind=3D"Knuth-TAOCP");
>>> sum(duplicated(runif(1e7))); #return 46552
>>> 
>>> RNGkind(kind=3D"Knuth-TAOCP-2002");
>>> sum(duplicated(runif(1e7))); #return 46415
>>> 
>>> #These collision frequency suggested there were 2^30 distinct values by=20
>>> birthday problem.
>>> 
>>> 
>>> RNGkind(kind=3D"Marsaglia-Multicarry");
>>> sum(duplicated(runif(1e7))); #return 11682
>>> 
>>> RNGkind(kind=3D"Super-Duper");
>>> sum(duplicated(runif(1e7))); #return 11542
>>> 
>>> RNGkind(kind=3D"Mersenne-Twister");
>>> sum(duplicated(runif(1e7))); #return 11656
>>> 
>>> #These indicated there were 2^32 distinct values, which agrees with the=20
>>> help info.
>>> 
>>> RNGkind(kind=3D"Wichmann-Hill");
>>> sum(duplicated(runif(1e7))); #return 0
>>> 
>>> #So for this method, there should be more than 2^32 distinct values.
>>> 
>>> You may not get the exact numbers, but they should be close. So how to=20
>>> explain above problem?
>>> 
>>> I need generate a large sample without any ties, it seems to me=20
>>> "Wichmann-Hill" is only choice right now.
>>> 
>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>> Shengqiao Li
>>> 
>>> The Department of Statistics
>>> PO Box 6330
>>> West Virginia University
>>> Morgantown, WV 26506-6330
>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>> 
>>> On Thu, 14 Aug 2008, Peter Dalgaard wrote:
>>> 
>>>> Shengqiao Li wrote:
>>>>> Hello all,
>>>>> =20
>>>>> I am generating large samples of random numbers. The RNG help page 
>>>>> says:=
>>> =20
>>>>> "All the supplied uniform generators return 32-bit integer values that 
>>>>> a=
>>> re=20
>>>>> converted to doubles, so they take at most 2^32 distinct values and 
>>>>> long=
>>> =20
>>>>> runs will return duplicated values." But I find that the cycles are not 
>>>>> =
>>> the=20
>>>>> same as the 32-bit integer.
>>>>> =20
>>>>> My test indicated that the cycles for Knuth's methods were 2^30 while=20
>>>>> Wichmann-Hill's cycle was larger than 2^32! No numbers were duplicated 
>>>>> i=
>>> n=20
>>>>> 10M numbers generated by runif using Wichmann-Hill. The other three 
>>>>> meth=
>>> ods=20
>>>>> had cycle length of 2^32.
>>>>> =20
>>>>> So, anybody can explain this? And any improvement to the implementation 
>>>>> =
>>> can=20
>>>>> be made to increase the cycle length like the Wichmann-Hill method?
>>>>> =20
>>>> What test? These are not simple linear congruential generators. Just 
>>>> beca=
>>> use=20
>>>> you get the same value twice, it doesn't mean that the sequence is 
>>>> repeat=
>>> ing.=20
>>>> Perhaps you should read the entire help page rather than just the note.
>>>> 
>>>> --=20
>>>>  O__  ---- Peter Dalgaard             =D8ster Farimagsgade 5, Entr.B
>>>> c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
>>>> (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
>>>> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907
>>>> 
>>>> 
>>> ---559023410-851401618-1218751024=:15885--
>>> 
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> 
>> 
>> -- 
>> Brian D. Ripley,                  ripley at stats.ox.ac.uk
>> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>> University of Oxford,             Tel:  +44 1865 272861 (self)
>> 1 South Parks Road,                     +44 1865 272866 (PA)
>> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>> 
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list