[R] parallel bootstrap linear model on multicore mac (re-post)

hiemstra hiemstra at knmi.nl
Fri Mar 4 09:48:00 CET 2011


On 03/02/2011 11:38 PM, Anthony Dick wrote:
> Hello all,
>
> I am re-posting my previous question with a simpler, more transparent,
> commented code.
>
> I have been ramming my head against this problem, and I wondered if
> anyone could lend a hand. I want to make parallel a bootstrap of a
> linear mixed model on my 8-core mac. Below is the process that I want to
> make parallel (namely, the boot.out<-boot(dat.res,boot.fun, R = nboot)
> command). This is an extension to lmer of the bootstrapping linear
> models example in Venables and Ripley. Please excuse my rather terrible
> programming skills. I am always open to suggestions. Below the example I
> describe what methods I have tried.
>
> library(boot)
> library(lme4)
> dat<-read.table("http://www2.fiu.edu/~adick/downloads/toy2.dat  <http://www2.fiu.edu/%7Eadick/downloads/toy2.dat>", header = T)
> nboot<-1000 # number of bootstraps
> attach(dat)
> x<-dat[,2] # IV number 1
> y<-dat[,4] # DV
> z<-dat[,3] # IV number 2
> subj<-dat[,1] # random factor
> boot.fun<-function(data,i) { # function to resample residuals
>                d<-data
>                d$y<- d$fitted+d$res[i] # populate new y values based on
> resampled residuals
>                as.numeric(coef(update(m2.fit,data=d))[1][[1]][1,c(1:4)])
> # update the linear model and output the coefficients
>                }
> fit<-lmer(y~x*z + (1|(subj))) # the linear model
> dat.res<-data.frame(y,x,z,subj, res=resid(fit), fitted=fitted(fit)) #
> add residuals and fitted values to dat
> boot.out<-boot(dat.res,boot.fun, R = nboot) # run the bootstrap using
> the boot.fun
> boot.out
>
> Methods attempted:
>
> Using the multicore package, I tried
> boot.out<-collect(parallel(boot(dat.res,boot.fun, R = nboot))). This
> returned a correct result, but did not speed things up. Not sure why...
Hi Anthony,

When the individual calls passed on to the cluster are very short (which
might be the case for your bootstrap), the overhead of running them
parallel becomes very large, negating the positive effect of running the
processes parallel. This could be an explanation for the lack of speed
improvement. A solution could be to not send individual bootstrap calls
to the cluster, but sets of calls. This decrease the overhead for
parallel running.

cheers,
Paul
> I also tried snowfall and snow. While I can create a cluster and run
> simple processes (e.g., provided example from literature), I can't get
> the bootstrap to run. For example, using snow:
>
> cl<- makeCluster(8)
> clusterSetupRNG(cl)
> clusterEvalQ(cl,library(boot))
> clusterEvalQ(cl,library(lme4))
> boot.out<-clusterCall(cl,boot(dat.res,boot.fun, R = nboot))
> stopCluster()
>
> returns the following error:
>
> Error in checkForRemoteErrors(lapply(cl, recvResult)) :
>     8 nodes produced errors; first error: could not find function "fun"
>
> I am stuck and at the limit of my programming knowledge and am punting
> to the R-help list. I need to run this process thousands of times, which
> is the reason to make it parallel. Any suggestions are much appreciated.
>
>
> Anthony
>


-- 
Paul Hiemstra, MSc
Global Climate Division
Royal Netherlands Meteorological Institute (KNMI)
Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39
P.O. Box 201 | 3730 AE | De Bilt
tel: +31 30 2206 494

http://intamap.geo.uu.nl/~paul
http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770



More information about the R-help mailing list