[R] RNG Cycle and Duplication

Shengqiao Li shli at stat.wvu.edu
Thu Aug 14 23:57:04 CEST 2008


I didn't describe the problem clearly. It's about the number of distinct 
values. So just ignore cycle issue.

My tests were:

RNGkind(kind="Knuth-TAOCP");
sum(duplicated(runif(1e7))); #return 46552

RNGkind(kind="Knuth-TAOCP-2002");
sum(duplicated(runif(1e7))); #return 46415

#These collision frequency suggested there were 2^30 distinct values by 
birthday problem.


RNGkind(kind="Marsaglia-Multicarry");
sum(duplicated(runif(1e7))); #return 11682

RNGkind(kind="Super-Duper");
sum(duplicated(runif(1e7))); #return 11542

RNGkind(kind="Mersenne-Twister");
sum(duplicated(runif(1e7))); #return 11656

#These indicated there were 2^32 distinct values, which agrees with the 
help info.

RNGkind(kind="Wichmann-Hill");
sum(duplicated(runif(1e7))); #return 0

#So for this method, there should be more than 2^32 distinct values.

You may not get the exact numbers, but they should be close. So how to 
explain above problem?

I need generate a large sample without any ties, it seems to me 
"Wichmann-Hill" is only choice right now.

========================================
Shengqiao Li

The Department of Statistics
PO Box 6330
West Virginia University
Morgantown, WV 26506-6330
========================================

On Thu, 14 Aug 2008, Peter Dalgaard wrote:

> Shengqiao Li wrote:
>> Hello all,
>> 
>> I am generating large samples of random numbers. The RNG help page says: 
>> "All the supplied uniform generators return 32-bit integer values that are 
>> converted to doubles, so they take at most 2^32 distinct values and long 
>> runs will return duplicated values." But I find that the cycles are not the 
>> same as the 32-bit integer.
>> 
>> My test indicated that the cycles for Knuth's methods were 2^30 while 
>> Wichmann-Hill's cycle was larger than 2^32! No numbers were duplicated in 
>> 10M numbers generated by runif using Wichmann-Hill. The other three methods 
>> had cycle length of 2^32.
>> 
>> So, anybody can explain this? And any improvement to the implementation can 
>> be made to increase the cycle length like the Wichmann-Hill method?
>> 
> What test? These are not simple linear congruential generators. Just because 
> you get the same value twice, it doesn't mean that the sequence is repeating. 
> Perhaps you should read the entire help page rather than just the note.
>
> -- 
>  O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
> c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
> (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907
>
>


More information about the R-help mailing list