[Rd] [R] multithreading calling from the rpy Python package

Andrew Piskorski atp at piskorski.com
Sat Oct 21 16:10:12 CEST 2006


On Fri, Oct 20, 2006 at 04:11:11PM +0200,  Ren? J.V. Bertin  wrote:
> Since Python has been mentioned in this context: Could not Python's
> threading model and implementation serve as a guideline?

Why would you want to do that?  Does that model have some particular
synergy with R's design or current limitations?  (From later posts, I
suspect that the answer to that is "yes".)

I'm curious, what is the current state of the R implementation's
support for thread-safety, re-entrant code, and concurrency?  What
issues in the code block improvement?  Is there a summary anywhere
more recent than the the info here?:

  http://developer.r-project.org/RThreads/index.html

And perhaps most importantly, what are the main users or use cases
driving (or likely to drive in the future) improved multi-threading
support?  Who really wants it, and why?

It would certainly be nice to make R completely thread-safe (and
ideally, fully reentrant), so that it could be easily embedded into a
multi-threaded program.  But, that sounds like a non-mainstream use of
R, and thus unlikely to motivate most of the R core developers.

R also already has about 8 different packages for doing parallel
programming across multiple machines using MPI, PVM, or etc.  I wonder
to what degree these obviate the need for multi-threaded R.
(Certainly not 100%, as most of those approaches will have vastly
worse latency than multi-threading.  But I wonder how much of the same
use cases they cover, 20%, or 80%?)

There's also the broader question of how or whether multi-threading
support would help, hinder, or otherwise interact with other potential
cool changes to R.  E.g., a high-performance byte-code interpreter, or
Lisp-like macros.

Any large single threaded application like R is going to have its own
particular obstacles to making the code thread-safe, and more to
making it reentrant and adding good multi-threading support of one
flavor or another.

I don't know what the particular obstacles for R are.  But if I was
looking for a GENERAL example of implementing a multi-threaded
interpreter, I certainly wouldn't choose Python.

Neither Python nor Ruby have real multi-threading, and are only
thread-safe via some sort of global lock.  (Only one thread can ever
run at a time, no matter how many CPUs your machine has.)  Perl
probably does support real multi-threading, but people say it hasn't
had much real world use.  Tcl has excellent multi-threading support,
using independent interpreters on top of OS threads (POSIX or Win32),
and has been in heavy use for many years.  I don't know about Lua,
JavaScript, the eleventy dozen different Scheme implementations,
etc. etc.

There are lots of different models for threading and concurrency.
OS-threads vs. user-mode threads is just one variable in the choice
space.  There's also message passing vs. shared memory, threads
vs. event-based programming, default shared-everything vs. default
shared-nothing, etc. etc.

In my experience, Tcl's "apartment model" for multi-threading is quite
nice to work with.  With it, you program each Tcl interpreter as if it
was a stand-alone shared-nothing process, which communicates with
other Tcl interpreters only through explicit shared memory and message
passing APIs.  Underneath, the C code sees the true shared-everything
threading implementation, but also has APIs to make working with it
easier.

(Note that Tcl implements that on top of POSIX and Win32 threads, but
you could implement the exact same script-level model on top of
inter-process shared memory.  Only the C code underneath would see any
difference.)

I have never really used any other threading model extensively, so I
can't do a proper hands-on compare and contrast.  However, my
suspicion is that "shared nothing by default" models (like Tcl's
threading) are the better way to go, rather than "shared everything by
default" (like POSIX threads).  The Erlang and Mozart/Oz folks both
seem to think so, etc.

-- 
Andrew Piskorski <atp at piskorski.com>
http://www.piskorski.com/




More information about the R-devel mailing list