[Rd] One possible use for threads in R
ross at biostat.ucsf.edu
Thu Feb 8 22:02:36 CET 2007
I have been using R on a cluster with some work that does not
parallelize neatly because the time individual computations take
varies widely and unpredictably.
So I've considered implementing a work-stealing arrangement, in which
idle nodes grab tasks from busy nodes. It might also be useful for
nodes to communicate results with each other.
My first thought on handling this was to have one R thread that
managed the communication, and 2 that managed computation (each node
Previous discussion has noted that R is not multi-threaded, and also
asked what use cases multi-threading might address. So here's a use
The advantage of having R doing the communication is that it's easy to
pass R-level objects around using, e.g., Rmpi. The advantage of
having the communicator and the calculators share the same thread is
that work and information the communicator got would be immediately
available to the calculators.
Other comments suggested IPC is fast (though one comment referred
specifically to Linux, and the cluster is OS-X), so it may be quite
workable to have each thread in a separate process.
I'm not at all sure the implementation I sketched above is the best
approach to this problem (or even that it would be if R were
multi-threaded), but it does seem to me this might be one area where
threads would be handy in R.
More information about the R-devel