[R] parallel computation with plyr 1.2.1

Dylan Beaudette debeaudette at ucdavis.edu
Thu Sep 16 19:11:55 CEST 2010


Hi,

I have been trying to use the new .parallel argument with the most recent 
version of plyr [1] to speed up some tasks. I can run the example in the NEWS 
file [1], and it seems to be working correctly. However, R will only use a 
single core when I try to apply this same approach with ddply(). 

1. http://cran.r-project.org/web/packages/plyr/NEWS

Watching my CPUs I see that in both cases only a single core is used, and they 
take about the same amount of time. Is there a limitation with how ddply() 
dispatches parallel jobs, or is this task not suitable for parallel 
computing?

Cheers,
Dylan


Here is an example:

library(plyr)
library(doMC)
registerDoMC(cores=2)

# example data
d <- data.frame(y=rnorm(1000), id=rep(letters[1:4], each=500))

# function that wastes some time
f <- function(x) {
m <- vector(length=10000)
for(i in 1:10000) {
	m[i] <- mean(sample(x$y, 100))
	}
mean(m)
}

system.time(ddply(d, .(id), .fun=f, .parallel=FALSE))
#  user  system elapsed 
#  2.740   0.016   2.766 

system.time(ddply(d, .(id), .fun=f, .parallel=TRUE))
#  user  system elapsed 
#  2.720   0.000   2.726 





-- 
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341



More information about the R-help mailing list