[Rd] portable parallel seeds project: request for critiques

Petr Savicky savicky at cs.cas.cz
Tue Feb 21 14:04:27 CET 2012

On Fri, Feb 17, 2012 at 02:57:26PM -0600, Paul Johnson wrote:
> I've got another edition of my simulation replication framework.  I'm
> attaching 2 R files and pasting in the readme.
> I would especially like to know if I'm doing anything that breaks
> .Random.seed or other things that R's parallel uses in the
> environment.
> In case you don't want to wrestle with attachments, the same files are
> online in our SVN
> http://winstat.quant.ku.edu/svn/hpcexample/trunk/Ex66-ParallelSeedPrototype/


In the description of your project in the file


you argue as follows

  Question: Why is this better than the simple old approach of
  setting the seeds within each run with a formula like
  set.seed(2345 + 10 * run)
  Answer: That does allow replication, but it does not assure
  that each run uses non-overlapping random number streams. It
  offers absolutely no assurance whatsoever that the runs are
  actually non-redundant.

The following demonstrates that the function set.seed() for
the default generator indeed allows to have correlated streams.

  step <- function(x)
      x[x < 0] <- x[x < 0] + 2^32
      x <- (69069 * x + 1) %% 2^32
      x[x > 2^31] <- x[x > 2^31] - 2^32

  n <- 1000
  seed1 <- 124370417 # any seed
  seed2 <- step(seed1)

  x <- runif(n)
  y <- runif(n)

  rbind(seed1, seed2)
  table(x[-1] == y[-n])

The output is

  seed1 124370417
  seed2 205739774
      5   994 

This means that if the streams x, y are generated from the two
seeds above, then y is almost exactly equal to x shifted by 1.

What is the current state of your project?

Petr Savicky.

More information about the R-devel mailing list