[R] Parallel computing with snow

Gang Chen gangchen6 at gmail.com
Fri Jan 2 23:36:11 CET 2009


I've been using parApply() in snow package for parallel computing with
the following lines in R 2.8.1:

  library(snow)
  nNodes <- 4
  cl <- makeCluster(nNodes, type = "SOCK")
  fm <- parApply(cl, myData, c(1,2), func1, ...)

Since I have a Mac OS X (version 10.4.11) with two dual-core
processors, I thought that I could run 4 simultaneous clusters.
However with the 1st job it seems only two clusters (362 and 364
below) were running with roughly the same CPU time (4th column) while
the other two clusters were pretty much idling (I assume the 1st row
with PID 357 was the main process with which I started R):

 PID COMMAND      %CPU   TIME
 357         R             0.0%       0:15.81
 362         R            99.8%      11:41.07
 364         R           100.3%      12:26.43
 366         R             0.0%       0:01.67
 368         R             0.0%       0:01.68

Why weren't 4 clusters split roughly equally in CPU time with two barely used?

I also tried a different job with fm <- parApply(cl, myData, c(1,2),
func2, ...), and the result is slightly different with all 4 clusters
more or less involved although they were still not distributed evenly
neither:

 PID COMMAND      %CPU   TIME
 413          R            0.0%       0:18.46
 419          R           93.3%       2:53.62
 421          R           93.6%       6:07.85
 423          R           92.8%       5:12.13
 425          R           93.3%       1:39.73

What gives? Why different usage of clusters between the two jobs?

All help is highly appreciated,
Gang




More information about the R-help mailing list