[R] clusterCall with replicate function

Luke Tierney luke at stat.uiowa.edu
Tue Aug 21 22:29:03 CEST 2007


replicate uses non-standard evaluation of its expr argument so that
doesn't work well with clusterCall or clusterApply. The way you are
using it you will get the same answer on all nodes no patter what you
do about the generators because the computations are all being done
on the master.

You can use clusterEvalQ, as in

     clusterEvalQ(cl, replicate(2, runif(2)))

Or you can use something like

     clusterCall(cl, function(n) replicate(n, runif(2)), 2)

This will still probably give identical results most of the time if
you don't do something about the RNG.  clusterSetupRNG will help you
with that -- just install the rlecuyer package and call
clusterSetupRNG.

Best,

luke

On Tue, 21 Aug 2007, Michael Gormley wrote:

> I am trying to run a monte carlo process using snow with a MPI cluster.  I
> have ~thirty processors to run the algorithm on and I want to run it 5000
> times and take the average of the output.  A very simple way to do this is
> to divide 5000 by the number of processors to get a number n and tell each
> processor to run the algorithm n times.  I realize there are more efficient
> ways to manage the parallelization.   To implement this I used the
> clusterCall command with the replicate function along the lines of
> clusterCall(cl, replicate, n, function(args)).  Because my function is a
> monte carlo process it relies on drawing from random distributions to
> generate output.  When I do this, all of my processors generate the same
> random numbers.  I copied the following from the command space for a simple
> example:
> cl<-makeCluster(cl, replicate,1,runif(2))
> clusterCall(cl, replicate, 2, runif(2))
> [[1]]
> 0.6533959    0.6533959
> 0.1071051    0.1071051
> [[2]]
> 0.6533959    0.6533959
> 0.1071051    0.1071051
>
> This is not alleviated by using clusterApply to set a random seed for each
> processor and seems to be related to the use of the replicate function
> within clusterCall.  I have rearranged the function so that replicate is
> used to call the clusterCall function (ie. replicate(2, clusterCall(cl,
> runif,2),simplify=F) ) and resolved the random number issue.  However, this
> also involves much more communication between master and slaves and results
> in slower computation time.   Will rsprng fix this problem?  Is there a
> better way to do this without using replicate?
> I hope this is somewhat clear.
>
> Thanks,
> Mike
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:      luke at stat.uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu



More information about the R-help mailing list