[R] Wichmann-Hill Random Number Generator and the Birthday Problem

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue Aug 19 08:45:12 CEST 2008


Please don't lie!  You falsely claimed:

> In R version 2.7.1, Brian Ripley adopted Wichmann's 1984 correction in 
> RNG document.

The period in the help has been unchanged since at least Jan 2000 (since 
Random.Rd has not had any changes to those lines since then).

Your coincidence calculations may be correct for _independent_ draws from 
a discrete distribution on M values, but independence is not satisfied.
Yet again, you are trying to do things that any good text on simulation 
would warn you against, and which (in a thread on R-devel) you have 
already been told a good way to do.

On Sun, 17 Aug 2008, Shengqiao Li wrote:

>
> On Sun, 17 Aug 2008, Duncan Murdoch wrote:
>
>> Shengqiao Li wrote:
>>> Dear all,
>>> 
>>> Recently I am generating large random samples (10M) and any duplicated 
>>> numbers are not desired.
>>> We tried several RNGs in R and found Wichmann-Hill did not produce 
>>> duplications.
>>> 
>>> The duplication problem is the interesting birthday problem. If there are 
>>> M possible numbers, randomly draw N numbers from them,
>>> the average number of dupilcations D = N(N-1)/2/M.
>>> 
>>> For Knuth-TAOCP and Knuth-TAOCP-2002, M=2^30, since this modulus is used. 
>>> D = 46566.12 for N=10M samples.
>>> 
>>> For Marsaglia-Multicarry, Super-Duper and Mersene-Twister, M=2^32. D = 
>>> 11641.53 for N = 10M samples.
>>> 
>>> My testing results (see below) agree with above analysis. But for 
>>> Wichmann-Hill, it wasn't. Wichmann-Hill's cycle is 6.9536e12 (refer to RNG 
>>> help by ?RNG and Whichmann's correction in 1984). Thus M <= 6.9536e12. D 
>>>> = 7.19052 when N=1e7 and D>= 179.763 for N=5e7. 
>>> But in my tests, duplications were surprisingly not observed.
>>> 
>>> It seems that Wichmann-Hill has a much larger cycle than the one 
>>> documented!
>>> 
>>> Anybody can solve this puzzle?
>>> 
>> 
>> As I told you, look at the code. The cycle length of Wichmann-Hill in R 
>> appears to be 2.8e13, and that also appears to be M in your notation.  Your 
>> birthday problem calculation does not apply here.
>> You could probably get a good approximation to it by rounding the 
>> Wichmann-Hill output (e.g. look at round(2^30 * runif()), or maybe more 
>> severe rounding).
>> 
>> M is larger than what was previously documented, but Brian Ripley has 
>> corrected the documentation.
>
> Wichmann claimed 2.78X10^13 in his 1982 original paper. He made correction in 
> 1984, "The period of the generator was incorrectly given as 2.78 X 10^13. In 
> fact its period is only a quarter of that value: 6.95X10^12." In R version 
> 2.7.1, Brian Ripley adopted Wichmann's 1984 correction in RNG document. That 
> is the currently documented cycle is 6.95X10^12 not 2.78X10^13 nor your 
> approxmation 2.8e13.
>
> So, is the cycle claimed in 1982 correct or the one claimed in 1984?
>
> Even if the larger one 2.78e13 claimed in 1982 is correct,  around 45 
> duplications were expected for 50M samples. But I got 0.  My testing method 
> is based on the example code in the last third line in RNG help.
>
> Shengqiao Li
>
>> 
>> Duncan Murdoch
>>> Regards,
>>> 
>>> Shengqiao Li
>>> 
>>> Department of Statistics
>>> West Virgina Unversity
>>> 
>>> ==============Testing===================
>>> 
>>> 
>>>> RNGkind(kind="Knuth-TAOCP");
>>>> RNGkind();
>>>> 
>>> [1] "Knuth-TAOCP" "Inversion"
>>> 
>>>> sum(duplicated(runif(1e7)));
>>>> 
>>> [1] 46216
>>> 
>>>> RNGkind(kind="Knuth-TAOCP-2002");
>>>> RNGkind();
>>>> 
>>> [1] "Knuth-TAOCP-2002" "Inversion"
>>> 
>>>> sum(duplicated(runif(1e7)));
>>>> 
>>> [1] 46373
>>> 
>>> 
>>>> RNGkind(kind="Marsaglia-Multicarry");
>>>> RNGkind();
>>>> 
>>> [1] "Marsaglia-Multicarry" "Inversion"
>>> 
>>>> sum(duplicated(runif(1e7)));
>>>> 
>>> [1] 11671
>>> 
>>>> RNGkind(kind="Super-Duper");
>>>> RNGkind();
>>>> 
>>> [1] "Super-Duper" "Inversion"
>>> 
>>>> sum(duplicated(runif(1e7)));
>>>> 
>>> [1] 11704
>>> 
>>>> RNGkind(kind="Mersenne-Twister");
>>>> RNGkind();
>>>> 
>>> [1] "Mersenne-Twister" "Inversion"
>>> 
>>>> sum(duplicated(runif(1e7)));
>>>> 
>>> [1] 11508
>>> 
>>>> RNGkind(kind="Wichmann-Hill");
>>>> RNGkind();
>>>> 
>>> [1] "Wichmann-Hill" "Inversion"
>>> 
>>>> sum(duplicated(runif(1e7)));
>>>> 
>>> [1] 0
>>> 
>>> 
>>>> gc()
>>>>
>>>            used (Mb) gc trigger  (Mb)  max used   (Mb)
>>> Ncells  199157  5.4     646480  17.3  10149018  271.1
>>> Vcells 4497151 34.4  108714629 829.5 169585390 1293.9
>>> 
>>> 
>>>> sum(duplicated(runif(5e7)))
>>>> 
>>> [1] 0
>>> 
>>> 
>>> 
>>>> sessionInfo()
>>>> 
>>> R version 2.7.1 (2008-06-23)
>>> i386-pc-mingw32
>>> 
>>> locale:
>>> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
>>> States.1252;LC_MONETARY=English_United 
>>> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>>> 
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>> 
>>> loaded via a namespace (and not attached):
>>> [1] tools_2.7.1
>>> 
>>> ==============End of Testing===================
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide 
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>> 
>> 
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list