[R] Rserve and R to R communication

Tue Apr 10 23:17:51 CEST 2007

On 4/10/07, AJ Rossini <blindglobe at gmail.com> wrote:
> On Monday 09 April 2007 23:02, Ramon Diaz-Uriarte wrote:
>
> > (Yes, maybe I should check snowFT, but it uses PVM, and I recall a
> > while back there was a reason why we decided to go with MPI instead of
> > PVM).
>
> There is no reason that you can't run both MPI and PVM on the same cluster.
>

Yes, sure. We actually did that for a while. But we eventually settled on MPI.

> There is a particular reason that the first implementation we (Na Li, who did
> most of the work, and myself) made used PVM -- at the time (pre MPI 2) it was
> far more advanced than MPI as far as "interactive parallel computing", i.e.
> dispatch parallel functions interactively from the command line, creating and
> manipulating virtual machines on the fly.
>

Of course, you are right there. I think that might still be the case.
At the time we made our decision, and decided to go for MPI, MPI 2 was
already out, and MPI seemed "more like the current/future standard"
than PVM. A feeling that was reinforced by seeing some key people of
PVM (e.g., Dongarra) also involved in MPI, as well as very active
development of MPI (e.g., LAM, mpich, and later OpenMPI). And MPI
seemed more like "the usual message passing" (which for us was, at
that time at least, a good thing). And we were also using MPI in C++
code. So we decided to bet on MPI.

> Of course, most MPI implementations will save you loads of deci-seconds on
> transfer of medium size messages over the wire, but we weren't interested in

Oh, but those deci-seconds were never the reason we decided to choose
MPI. We are using R after all, not HPF :-).

> that particular aspect, more in saving days over the course of a one-off
> program (i.e. development time, which can be more painful that run-time).
>

Right. And of course we never thought MPI would cost us significantly
more development time than PVM (and that the increased development
time would be compensated by the above mentioned deci-seconds).
Moreover, most of these are not one-off programs, but web applications
(some of which have been running for over two years) where easy
debugging is crucial for us if we have to revisit the code 6 months
later (and for that we found papply quite more useful than snow
---more below).

> Now, PVM had the necessary tools for fault tolerance -- though I thought that
> the recent MPI and newer message passing frameworks might have had some of
> that implemented.
>

Some MPIs have been developed that incorporate it. But I do not think
that is easy with LAM/MPI nor via Rmpi. The problem is that once a
node goes down, the whole LAM universe gets screwed up.

> And remember, the point of snow was to provide platform-independent parallel
> code (for which it was the first, for nearly any language/implementation),
> not to run it like a bat-out-of-hell...  (we assumed it would be cheaper to
> buy more machines than to spend a few months finding a budget along with
> sharp programmers).
>

So using papply with Rmpi requires sharper programmers than using
snow? Hey, it is good to know I am that smarter. I'll wear that as a
badge :-).

Anyway, papply (with Rmpi) is not, in my experience, any harder than
snow (with either rpvm or Rmpi). In fact, I find papply a lot simpler
than snow (clusterApply and clusterApplyLB). For one thing, debugging
is very simple, since papply becomes lapply if no lam universe is
booted.

I see, though, that I might want to check PVM just for the sake of the
fault tolerance in snowFT.

Best,

R.

> best,
> -tony
>
> blindglobe at gmail.com
> Muttenz, Switzerland.
> "Commit early,commit often, and commit in a repository from which we can
> easily
> roll-back your mistakes" (AJR, 4Jan05).
>
>

-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz