[R] bug (?) with lapply / clusterMap / clusterApply etc

jacob at forestlidar.org jacob at forestlidar.org
Tue Mar 22 18:46:13 CET 2016


Hello I have encountered a bug(?) with the parallel package. When run  
from within a function, the parLapply function appears to be copying  
the entire parent environment (environment of interior of function)  
into all child nodes in the cluster, one node at a time - which is  
very very slow - and the copied contents are not even accessible  
within the child nodes even though they are apparent in the memory  
footprint. This happens when parLapply is run from within a function.  
I may be misusing the terms "parent" and "node" here...

The below code demonstrates the issue. The same parallel command is  
used twice within the function, once before creating a large object,  
and once afterwards. Both commands should take a nearly identical  
amount of time. Initially the parallel code takes less than 1/100th of  
a second, but in the second iteration requires hundreds of times  
longer...

Example Code:

      #create a cluster of nodes
      if(!"clus1" %in% ls()) clus1=makeCluster(10)

      #function used to demonstrate bug
      rows_fn1=function(x,clus){

          #first set of parallel code
           
print(system.time(parLapply(clus,1:5,function(z){y=rnorm(5000);return(mean(y))})))

          #create large vector
          x=rnorm(10^7)

          #second set
           
print(system.time(parLapply(clus,1:5,function(z){y=rnorm(5000);return(mean(y))})))

      }

      #demonstrate bug - watch task manager and see windows slowly  
copy the vector to each node in the cluster
      rows_fn1(1:5000,clus1)

Although the child nodes bloat proportionally to the size of x in the  
parent environment, x is not available in the child nodes. The code  
above can be tweaked to add more variables (x1,x2,x3 ...) and the  
child nodes will bloat to the same degree.

I am working on Windows Server 2012, I am using 64bit R version 3.2.1.  
I upgraded to 3.2.4revised and observed the same bug.

I have googled for this issue and have not encountered any other  
individuals having a similar problem.

I have attempted to reboot my machine without effect (aside from the obvious).

Any suggestions would be greatly appreciated!

With regards,

Jacob L Strunk
Forest Biometrician (PhD), Statistician (MSc)
and Data Munger



More information about the R-help mailing list