[R] Passing data among multiple instances

Warren Young warren at etr-usa.com
Wed Feb 4 16:02:46 CET 2009


Feng Li wrote:
> 
> I have two R instances running at the same time, 

On the same computer, or on different computers?

Is the number of Rs likely to change, or will it always be just the two?

Is this a simple one-off problem, or are you breaking the problem up 
into pieces so you can throw lots of hardware at it?

> Is there a simpler way to pass the data in A to B?

Perhaps the simplest option is to write the data structure to a file, 
using any of the several R ways to do that.  When instance 2 sees that a 
file is available, it slurps its contents in and works on it.  The hard 
part is making the second instance wait until the whole file is written 
out by the first.  You wouldn't want it to read in half the file then 
hit the end because the first process hasn't finished writing out the 
file.  I don't see any good mechanism in R to fix this.

A more robust option is to use sockets.  This is suitable even within a 
single machine.  See ?make.socket.  This solves the "how do I know when 
I've got the full data structure problem" because the second process can 
just keep reading until it gets an error indicating that the remote peer 
closed the connection.  Once you have the data structure in string form, 
you can eval() it to get an R object suitable for munching on.  Figuring 
out how to pass the data might be the hardest part.  deparse() might be 
the easiest way.

If you're hoping to scale this up to lots of processes, look into Rmpi. 
  This provides a very clean way for an R program on one computer to 
start slaves on other computers and then pass data to them in native R 
structures.  Setting up MPI itself is not trivial, however.  It's best 
when you already have a cluster of computers linked with MPI.




More information about the R-help mailing list