[R] Wichmann-Hill Random Number Generator and the Birthday Problem

Duncan Murdoch murdoch at stats.uwo.ca
Sun Aug 17 12:24:28 CEST 2008


Shengqiao Li wrote:
> Dear all,
>
> Recently I am generating large random samples (10M) and any duplicated 
> numbers are not desired.
> We tried several RNGs in R and found Wichmann-Hill did not produce 
> duplications.
>
> The duplication problem is the interesting birthday problem. If there are 
> M possible numbers, randomly draw N numbers from them,
> the average number of dupilcations D = N(N-1)/2/M.
>
> For Knuth-TAOCP and Knuth-TAOCP-2002, M=2^30, since this modulus is used. 
> D = 46566.12 for N=10M samples.
>
> For Marsaglia-Multicarry, Super-Duper and Mersene-Twister, M=2^32. D = 
> 11641.53 for N = 10M samples.
>
> My testing results (see below) agree with above analysis. But for 
> Wichmann-Hill, it wasn't. Wichmann-Hill's cycle is 6.9536e12 (refer to RNG 
> help by ?RNG and Whichmann's correction in 1984). Thus M <= 6.9536e12. D 
>   
>> = 7.19052 when N=1e7 and D>= 179.763 for N=5e7. 
>>     
> But in my tests, duplications were surprisingly not observed.
>
> It seems that Wichmann-Hill has a much larger cycle than the one 
> documented!
>
> Anybody can solve this puzzle?
>   

As I told you, look at the code. 

The cycle length of Wichmann-Hill in R appears to be 2.8e13, and that 
also appears to be M in your notation.  Your birthday problem 
calculation does not apply here.
You could probably get a good approximation to it by rounding the 
Wichmann-Hill output (e.g. look at round(2^30 * runif()), or maybe more 
severe rounding).

M is larger than what was previously documented, but Brian Ripley has 
corrected the documentation.

Duncan Murdoch
> Regards,
>
> Shengqiao Li
>
> Department of Statistics
> West Virgina Unversity
>
> ==============Testing===================
>
>   
>> RNGkind(kind="Knuth-TAOCP");
>> RNGkind();
>>     
> [1] "Knuth-TAOCP" "Inversion"
>   
>> sum(duplicated(runif(1e7)));
>>     
> [1] 46216
>   
>> RNGkind(kind="Knuth-TAOCP-2002");
>> RNGkind();
>>     
> [1] "Knuth-TAOCP-2002" "Inversion"
>   
>> sum(duplicated(runif(1e7)));
>>     
> [1] 46373
>   
>
>   
>> RNGkind(kind="Marsaglia-Multicarry");
>> RNGkind();
>>     
> [1] "Marsaglia-Multicarry" "Inversion"
>   
>> sum(duplicated(runif(1e7)));
>>     
> [1] 11671
>   
>> RNGkind(kind="Super-Duper");
>> RNGkind();
>>     
> [1] "Super-Duper" "Inversion"
>   
>> sum(duplicated(runif(1e7)));
>>     
> [1] 11704
>   
>> RNGkind(kind="Mersenne-Twister");
>> RNGkind();
>>     
> [1] "Mersenne-Twister" "Inversion"
>   
>> sum(duplicated(runif(1e7)));
>>     
> [1] 11508
>   
>> RNGkind(kind="Wichmann-Hill");
>> RNGkind();
>>     
> [1] "Wichmann-Hill" "Inversion"
>   
>> sum(duplicated(runif(1e7)));
>>     
> [1] 0
>
>   
>> gc()
>>     
>            used (Mb) gc trigger  (Mb)  max used   (Mb)
> Ncells  199157  5.4     646480  17.3  10149018  271.1
> Vcells 4497151 34.4  108714629 829.5 169585390 1293.9
>
>   
>> sum(duplicated(runif(5e7)))
>>     
> [1] 0
>
>
>   
>> sessionInfo()
>>     
> R version 2.7.1 (2008-06-23)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
> States.1252;LC_MONETARY=English_United 
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] tools_2.7.1
>   
>
> ==============End of Testing===================
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list