[Rd] 'parallel' package changes '.Random.seed'

Henric Winell nilsson.henric at gmail.com
Fri Mar 7 11:21:19 CET 2014


Dear Prof Ripley,

Thank you for your kind reply.  Please find my comments below.

> On 2014-03-06 13:29, Prof Brian Ripley wrote:
>
>> On 06/03/2014 10:17, Henric Winell wrote:
>> Hi,
>>
>> I've implemented parallelization in one of my packages using the
>> 'parallel' package -- many thanks for providing it!
>>
>> In my package I'm importing 'parallel' and so added it to the
>> DESCRIPTION file's 'Import:' tag and also added a
>> 'importFrom("parallel", ...)' statement in the NAMESPACE file.
>>
>> Parallelization works nicely, but my package no longer passes any
>> parts of its (unparallelized) checks that depends on random number
>> generation, e.g., the simulated data in the check suite are no
>> longer the same as before parallelization was added.  This seems to
>> be due to 'parallel' changing '.Random.seed' when loading its name
>> space:
>>
>> > set.seed(1)
>> > rs1 <- .Random.seed
>> > rnorm(1)
>> [1] -0.6264538
>> > set.seed(1)
>> > rs2 <- .Random.seed
>> > identical(rs1, rs2)
>> [1] TRUE
>> > loadNamespace("parallel")
>> <environment: namespace:parallel>
>> > rs3 <- .Random.seed
>> > identical(rs1, rs3)
>> [1] FALSE
>> > rnorm(1)
>> [1] -0.3262334
>> > set.seed(1)
>> > rs4 <- .Random.seed
>> > identical(rs1, rs4)
>> [1] TRUE
>>
>> I've taken a look at the 'parallel' source code, and in a few places
>> a call to 'runif(1)' is issued.  So, what effectively seems to happen
>> when 'parallel' is loaded is
>>
>> > set.seed(1)
>> > runif(1)
>> [1] 0.2655087
>> > rnorm(1)
>> [1] -0.3262334
>>
>> which reproduces the above.  But is this really necessary?
>
> Yes, in the places it is used.

I apologize for not expressing myself more clearly here.  I do not 
dispute whether it is necessary to call runif(1L) or not.

What I meant to ask was: Is it really necessary for 'parallel' to change 
'.Random.seed' when its namespace is loaded?

> Two are to do with setting up parallel streams when called,

Yes, and are not relevant here.

> and the other is only called if R_PARALLEL_PORT is unset.

But can't that action, i.e., choosing a random port in the 11000:11999 
range, be implemented so that it doesn't change '.Random.seed' when the 
namespace is loaded?

> So set R_PARALLEL_PORT.

Thanks for the suggesting it.  This works nicely as shown in my earlier 
follow-up post, and makes the package pass its own test.

The downside is that this will require the same intervention from any 
other user relying on random number generation.

> But your presumptions are wrong: R is perfectly entitled to use its
> random number generator, as is other code running in the R
> interpreter.  Once your call returns you cannot expect the session
> state to remain unchanged.

You're completely right, of course.

But I believe that it may lead to surprising behaviour for some 
unsuspecting users (me included), and should be avoided if possible. 
But maybe you're surprised, that I'm surprised?

I've looked at your implementation in the 'boot' package, where 
'parallel' is not explicitly imported and thus '.Random.seed' is 
untouched after loading the 'boot' namespace.  Is it preferable to not 
import 'parallel' and access the relevant 'parallel' functions using the 
'::' operator as you did there?  Please advise.

>> And more importantly (at least to me):  Can it somehow be avoided?
>>
>> The current state of affairs is a bit unfortunate, since it implies
>> that a user just by loading the new parallelized version of my
>> package can no longer reproduce any subsequent results depending on
>> random number generation (unless a call to 'set.seed' was issued
>> *after* attaching my package).
>>
>> I'd be most grateful for any help that you're able to provide here.
>> Many thanks!
>>
>> Kind regards,
>> Henric Winell
>>
>>
>> > sessionInfo()
>> R Under development (unstable) (2014-01-26 r64897)
>
> See what the posting guide says about updating before posting ....

Thanks for reminding me -- an update is long overdue.

Before posting I checked the SVN repository and found that the relevant 
code in 'parallel' was the same as in the version I used.


Henric Winell



>
>> Platform: x86_64-redhat-linux-gnu (64-bit)
>>
>> locale:
>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>   [3] LC_TIME=sv_SE.UTF-8        LC_COLLATE=en_US.UTF-8
>>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> loaded via a namespace (and not attached):
>> [1] compiler_3.1.0 parallel_3.1.0 tools_3.1.0
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
> --
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list