[R] clusterCall with replicate function

Michael Gormley mpg33 at drexel.edu
Tue Aug 21 21:36:03 CEST 2007


I am trying to run a monte carlo process using snow with a MPI cluster.  I 
have ~thirty processors to run the algorithm on and I want to run it 5000 
times and take the average of the output.  A very simple way to do this is 
to divide 5000 by the number of processors to get a number n and tell each 
processor to run the algorithm n times.  I realize there are more efficient 
ways to manage the parallelization.   To implement this I used the 
clusterCall command with the replicate function along the lines of
clusterCall(cl, replicate, n, function(args)).  Because my function is a 
monte carlo process it relies on drawing from random distributions to 
generate output.  When I do this, all of my processors generate the same 
random numbers.  I copied the following from the command space for a simple 
example:
cl<-makeCluster(cl, replicate,1,runif(2))
 clusterCall(cl, replicate, 2, runif(2))
[[1]]
0.6533959    0.6533959
0.1071051    0.1071051
[[2]]
0.6533959    0.6533959
0.1071051    0.1071051

This is not alleviated by using clusterApply to set a random seed for each 
processor and seems to be related to the use of the replicate function 
within clusterCall.  I have rearranged the function so that replicate is 
used to call the clusterCall function (ie. replicate(2, clusterCall(cl, 
runif,2),simplify=F) ) and resolved the random number issue.  However, this 
also involves much more communication between master and slaves and results 
in slower computation time.   Will rsprng fix this problem?  Is there a 
better way to do this without using replicate?
I hope this is somewhat clear.

Thanks,
Mike



More information about the R-help mailing list