[Rd] Extreme bunching of random values from runif with Mersenne-Twister seed

Radford Neal radford at cs.toronto.edu
Sat Nov 4 16:33:10 CET 2017


In the code below, you seem to be essentially using the random number
generator to implement a hash function.  This isn't a good idea.

My impression is that pseudo-random number generation methods are
generally evaluated by whether the sequence produced from any seed
"appears" to be random.  Informally, there may be some effort to make
long sequences started with seeds 1, 2, 3, etc. appear unrelated,
since that is a common use pattern when running a simulation several
times to check on variability.  But you are relying on the FIRST
number from each sequence being apparently unrelated to the seed.  
I think few or none of the people designing pseudo-random number
generators evaluate their methods by that criterion.

There is, however, a large literature on hash functions, which is
what you should look at.

But if you want a quick fix, perhaps looking not at the first number
in the sequence, but rather (say) the 10th, might be preferable.

   Radford Neal


> > seeds = c(86548915L, 86551615L, 86566163L, 86577411L, 86584144L,
> 86584272L,
> +   86620568L, 86724613L, 86756002L, 86768593L, 86772411L, 86781516L,
> +   86794389L, 86805854L, 86814600L, 86835092L, 86874179L, 86876466L,
> +   86901193L, 86987847L, 86988080L)
> >
> > random_values = sapply(seeds, function(x) {
> +   set.seed(x)
> +   y = runif(1, 17, 26)
> +   return(y)
> + })
> >
> > summary(random_values)
>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>   25.13   25.36   25.66   25.58   25.83   25.94



More information about the R-devel mailing list