[Rd] Parallel number stream: clusterSetRNGStream

Henrik Bengtsson henr|k@bengt@@on @end|ng |rom gm@||@com
Fri Jun 7 22:31:53 CEST 2019


Yes, I would think this behavior is intentionally, but obviously, I
don't know for sure.  Looking at the code:

> parallel::clusterSetRNGStream
function (cl = NULL, iseed = NULL)
{
    cl <- defaultCluster(cl)
    oldseed <- if (exists(".Random.seed", envir = .GlobalEnv,
        inherits = FALSE))
        get(".Random.seed", envir = .GlobalEnv, inherits = FALSE)
    else NULL
    RNGkind("L'Ecuyer-CMRG")
    if (!is.null(iseed))
        set.seed(iseed)
    nc <- length(cl)
    seeds <- vector("list", nc)
    seeds[[1L]] <- .Random.seed

You'll find that:

1. the stream of RNG seeds, originates from .Random.seed.
2a. 'iseed' is only applied if non-NULL, which changes starting .Random.seed.
2b. If iseed = NULL, then the .Random.seed is whatever it was when you
called the function

If you use iseed = NULL, then you need to forward the RNG state
(=.Random.seed) yourself.   Here's an example:

set.seed(1)
library(parallel)
cl <- parallel::makeCluster(5)

str(.Random.seed)
# int [1:626] 10403 624 -169270483 -442010614 -603558397 -222347416 ...
clusterSetRNGStream(cl, iseed = NULL)
parSapply(cl, 1:5, function(i) sample(1:10, 1))
# [1]  7  4  2 10 10

str(.Random.seed)
# int [1:626] 10403 624 -169270483 -442010614 -603558397 -222347416 ...
clusterSetRNGStream(cl, iseed = NULL)
parSapply(cl, 1:5, function(i) sample(1:10, 1))
# [1]  7  4  2 10 10

## Forward RNG state
sample.int(1)
# [1] 1

str(.Random.seed)
# int [1:626] 10403 1 1654269195 -1877109783 -961256264 1403523942 ...
clusterSetRNGStream(cl, iseed = NULL)
parSapply(cl, 1:5, function(i) sample(1:10, 1))
# [1] 8 6 1 7 5


FYI, you see a similar behavior with parallel::mclapply():

set.seed(1)
library(parallel)
RNGkind("L'Ecuyer-CMRG")
unlist(parallel::mclapply(1:2, function(n) rnorm(n), mc.set.seed = TRUE))
# [1] -1.2673735  0.9045952  1.9502072
unlist(parallel::mclapply(1:2, function(n) rnorm(n), mc.set.seed = TRUE))
# [1] -1.2673735  0.9045952  1.9502072
## Forward RNG state
sample.int(1)
# [1] 1
unlist(parallel::mclapply(1:2, function(n) rnorm(n), mc.set.seed = TRUE))
# [1] -0.09117479 -1.07803714  0.13924063

I can see pros and cons with this behavior, but I think the default is
risky.  For instance, it's not hard to imagine an implementation
resampling algorithm where you have to option to run it via lapply()
or via parallel::mclapply() - there is a non-zero probability that
such an implementation produces identical samples.

Proper parallel RNG can be tricky

/Henrik

On Fri, Jun 7, 2019 at 7:09 AM Colin Gillespie <csgillespie using gmail.com> wrote:
>
> Dear All,
>
> Is the following expected behaviour?
>
> set.seed(1)
> library(parallel)
> cl = makeCluster(5)
> clusterSetRNGStream(cl, iseed = NULL)
> parSapply(cl, 1:5, function(i) sample(1:10, 1))
> # 7  4  2 10 10
> clusterSetRNGStream(cl, iseed = NULL)
> # 7  4  2 10 10
> parSapply(cl, 1:5, function(i) sample(1:10, 1))
> stopCluster(cl)
>
> The documentation could be read either way, e.g.
>
>  * iseed: An integer to be supplied to set.seed, or NULL not to set
> reproducible seeds.
>
> From Details
>
> .... optionally setting the seed of the streams by set.seed(iseed)
> (otherwise they are set from the current seed of the master process:
> after selecting the L'Ecuyer generator).
>
> As may be guessed, this caught me out, since I was expecting the same
> behaviour as set.seed(NULL).
>
> Thanks
>
> Colin
>
> ----------
>
> R version 3.6.0 (2019-04-26)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 18.04.2 LTS
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list