[R] efficient coding with foreach and bigmemory

Jay Emerson jayemerson at gmail.com
Fri Sep 30 19:15:17 CEST 2011


First, we strongly recommend 64-bit R.  Otherwise, you may not be able
to scale up as far as you would like.

Second, as I think you realize, with big objects you may have to do
things in chunks.  I generally recommend working a column at a time
rather than in blocks of rows if possible (better performance,
particularly if the filebacking is used because of matrices exceeding
RAM), and you may find that alternative data organization can really
pay off.  Keep an open mind.

Third, you really need to avoid this runif(1,...) usage.  It can't
possibly be efficient.  If a single call to runif() doesn't work,
break it into chunks, certainly, but going down to chunks of size 1
just can't make any sense.

Fourth, although you aren't there yet, once you get to the point you
are trying to do things in parallel with foreach and bigmemory, you
*may* need to place the following inside your foreach loop to make use
of the shared memory properly:

mdesc <- describe(m)
foreach(...) %dopar% {
  require(bigmemory)
  m <- attach.big.matrix(mdesc)
  ....  now operate on m
}

I say *may* because the backend doMC (not available in Windows) does
not require this, but the other backends do; otherwise, the workers
will not be able to properly address the shared-memory or filebacked
big.matrix.  Some documentation on bigmemory.org may help, and feel
free to email us directly.

Jay


-- 
John W. Emerson (Jay)
Associate Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay



More information about the R-help mailing list