[R] assigning vector or matrix sparsely (for use with mclapply)

ivo welch ivo.welch at gmail.com
Tue Mar 27 00:28:11 CEST 2012

Dear R wizards---

I have a wrapper on mclapply() that makes it a little easier for me to
do multiprocessing.  (Posting this may make life easier for other
googlers.)  I pass a data frame, a vector that tells me what rows
should be recomputed, and the function; and I get back a vector or
matrix of answers.

   d <- data.frame( id=1:6, val=11:16 )
   v1 <- mc.byselectrows( d, loc, function(x) x[,2]^2 )
   v2 <- mc.byselectrows(d, loc, function(x) cbind(x[,2]^2,x[,2]^3))

mc.byselectrows <- function(data.in, recalclist, FUN, ...) {

  data.notdone <- data.in[recalclist,]
  cat.stderr("[mc.byselectrows: ", nrow(data.notdone), "rows to be
recomputed out of", nrow(data.in), "]\n")

  FUN.ON.ROWS <- function(.index, ...)
as.matrix(FUN(data.notdone[.index,], ...))
  soln <- mclapply( as.list(1:nrow(data.notdone)) , FUN.ON.ROWS, ... )
  rv <- do.call("rbind", soln)  ## omits naming.
  if (ncol(rv)==1) rv <- as.vector(rv)

this works fine, except that what I want to get NA's in the return
positions that were not recalculated.  then, I can write

  newdata$y <- ifelse ( is.na(olddata$y), mc.byselectrows( olddata,
is.na(olddata$y), fun.calc.y ), olddata$y )

I can do this very inelegantly, of course.  I can merge recalclist
into data.in and then write a loop that substitutes for the do.call to
rbind.  yikes.  or I could do the recalclist contingency inside the
FUN.ON.ROWS, but this is costly in terms of execution time.  are there
obvious solutions?  advice appreciated.


Ivo Welch (ivo.welch at gmail.com)

More information about the R-help mailing list