[R] parallel bootstrap linear model on multicore mac (re-post)

Anthony Dick adick at fiu.edu
Thu Feb 24 20:23:52 CET 2011


Hello all,

I am re-posting my previous question with a simpler, more transparent, 
commented code.

I have been ramming my head against this problem, and I wondered if 
anyone could lend a hand. I want to make parallel a bootstrap of a 
linear mixed model on my 8-core mac. Below is the process that I want to 
make parallel (namely, the boot.out<-boot(dat.res,boot.fun, R = nboot) 
command). This is an extension to lmer of the bootstrapping linear 
models example in Venables and Ripley. Please excuse my rather terrible 
programming skills. I am always open to suggestions. Below the example I 
describe what methods I have tried.

library(boot)
library(lme4)
dat<-read.table("http://www2.fiu.edu/~adick/downloads/toy2.dat", header = T)
nboot<-1000 # number of bootstraps
attach(dat)
x<-dat[,2] # IV number 1
y<-dat[,4] # DV
z<-dat[,3] # IV number 2
subj<-dat[,1] # random factor
boot.fun<-function(data,i) { # function to resample residuals
              d<-data
              d$y<- d$fitted+d$res[i] # populate new y values based on 
resampled residuals
              as.numeric(coef(update(m2.fit,data=d))[1][[1]][1,c(1:4)]) 
# update the linear model and output the coefficients
              }
fit<-lmer(y~x*z + (1|(subj))) # the linear model
dat.res<-data.frame(y,x,z,subj, res=resid(fit), fitted=fitted(fit)) # 
add residuals and fitted values to dat
boot.out<-boot(dat.res,boot.fun, R = nboot) # run the bootstrap using 
the boot.fun
boot.out

Methods attempted:

Using the multicore package, I tried 
boot.out<-collect(parallel(boot(dat.res,boot.fun, R = nboot))). This 
returned a correct result, but did not speed things up. Not sure why...

I also tried snowfall and snow. While I can create a cluster and run 
simple processes (e.g., provided example from literature), I can't get 
the bootstrap to run. For example, using snow:

cl <- makeCluster(8)
clusterSetupRNG(cl)
clusterEvalQ(cl,library(boot))
clusterEvalQ(cl,library(lme4))
boot.out<-clusterCall(cl,boot(dat.res,boot.fun, R = nboot))
stopCluster()

returns the following error:

Error in checkForRemoteErrors(lapply(cl, recvResult)) :
   8 nodes produced errors; first error: could not find function "fun"

I am stuck and at the limit of my programming knowledge and am punting 
to the R-help list. I need to run this process thousands of times, which 
is the reason to make it parallel. Any suggestions are much appreciated.


Anthony


-- 
Anthony Steven Dick, Ph.D.
Assistant Professor
Department of Psychology
Florida International University
Modesto A. Maidique Campus DM 296B
11200 S.W. 8th Street
Miami, FL 33199
Phone: 305-348-4202
Lab Phone: 305-348-9057 or 305-348-9055 (I am usually here)
Fax: 305-348-3879
Email: adick at fiu.edu
Webpage: http://www.fiu.edu/~adick



More information about the R-help mailing list