[Rd] How to execute R scripts simultaneously from multiple threads

Martin Morgan mtmorgan at fhcrc.org
Thu Jan 4 21:07:09 CET 2007


Vladimir, Jeff, et al.,

This is more pre-publicity than immediately available solution, but
we've been working on the 'RWebServices' project. The R evaluator and
user functions get wrapped in Java, and the Java exposed as web
service. We use ActiveMQ to broker transactions between the front-end
web service and persistent back-end R workers. The workers rely on
SJava to wrap and evaluate R.

Some of the features and opportunities of this system are: strongly
typed functions in R (using the TypeInfo package); native R-Java
translation (using SJava and our own converters); programmatic
interface (i.e., as web services; this benefits from use of S4 as a
formal class system); scalable computation (through addition of more /
specialized workers in ActiveMQ); and access to Java-based tools
available for web service deployment. Creating web services can be
nearly automatic, once the R functions are appropriately typed. Mostly
our focus has been on big-data computation, which might be orthogonal
to the needs of the original post.

We will provide more information at the Directions in Statistical
Computing conference in mid-February, so please drop me a line if
you'd like to be kept up-to-date.

Martin
-- 
Martin T. Morgan
Bioconductor / Computational Biology
http://bioconductor.org

Jeffrey Horner <jeff.horner at vanderbilt.edu> writes:

> Vladimir Dergachev wrote:
>> On Thursday 04 January 2007 4:54 am, Erik van Zijst wrote:
>>> Vladimir Dergachev wrote:
>>>> On Wednesday 03 January 2007 3:47 am, Erik van Zijst wrote:
>>>>> Appearantly the R C-API does not provide a mechanism for parallel
>>>>> execution..
>>>>>
>>>>> It is preferred that the solution is not based on multi-processing (like
>>>>> C/S), because that would introduce IPC overhead.
>>>> One thing to keep in mind is that IPC is very fast in Linux. So unless
>>>> you are making lots of calls to really tiny functions this should not be
>>>> an issue.
>>> Using pipes or shared memory to pass things around to other processes on
>>> the same box is very fast indeed, but if we base our design around
>>> something like RServe which uses TCP it could be significantly slower.
>>> Our R-based system will be running scripts in response to high-volume
>>> real-time stock exchange data, so we expect lots of calls to many tiny
>>> functions indeed.
>> 
>> Very interesting :) 
>> 
>> If you are running RServe on the other box you will need to send data over 
>> ethernet anyway (and will probably use TCP). If it is on the same box and you 
>> use "localhost" the packets will go over loopback - which would be 
>> significantly faster.
>
> I haven't looked at RServe in awhile, but I think that it fires up an R 
> interpreter in response to a client request and then sticks around for 
> the same client to serve it additional requests. The question is how 
> does it manage all the R interpreters with varying demand.
>
> This issue is solved when you embed R into Apache (using the prefork 
> MPM), as the pool of apache child processes (each with their own R 
> interpreter) expands and contracts on demand. Using this with the 
> loopback device would be a nice solution:
>
> http://biostat.mc.vanderbilt.edu/RApacheProject
>
> Jeff
> -- 
> http://biostat.mc.vanderbilt.edu/JeffreyHorner
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list