[R] R on Beowulf cluster?

Liaw, Andy andy_liaw at merck.com
Tue Feb 8 12:25:00 CET 2005


> From: Barry Rowlingson
> 
> Liaw, Andy wrote:
> 
> > Thanks for the reply.  I'm aware of almost all of those things you
> > mentioned, and have played with some of them.  I'm not looking to do
> > distributed computing within R at this point.  What I'm 
> after, though, is
> > not really addressed by any of them.  I want a user to be 
> able to start an
> > interactive R session on a compute node that the system determines
> > (dynamically) to be most available at the time.  That's 
> what `beorun' does,
> > but I couldn't get it to run R interactively.
> > 
> 
>   As an alternative suggestion, have you thought of using OpenMOSIX 
> instead of your Beowulf tools? With an OpenMOSIX cluster your process 
> runs on whichever processor is best for it, and jumps around as CPU 
> availability within the cluster changes. If you've got a cluster with 
> slow and fast processors, jobs will run on the fast 
> processors first, no 
> matter where the user is logged in.
> 
>   You can also add machines to the cluster without having to install 
> software on them - system calls are made back on the initial 
> machine so 
> the process always sees the files it had there. Its like 
> plugging in a 
> new CPU.
> 
>   We've been running an OpenMOSIX cluster for a few years 
> now, and apart 
> from the odd problem, it works really well. Older versions of R had 
> trouble migrating, but anything above 1.7 quickly finds itself a nice 
> zippy host and uses that.
> 
>   Note this is x86 processor only, since it's a set of patches to the 
> linux kernel.
> 
> Baz

Baz,

Thanks for the suggestion.  oM would have been my first choice, too
(especially with Quantian).  The problem is that these are dual Opterons...

The Scyld Beowulf has many of the advantages that you mentioned: it's
trivial to bring up diskless compute nodes, and the head node can
dynamically determine which compute node has the most resource to run a job.
One thing that oM does that, AFAICT Scyld Beowulf doesn't do (at least not
automatically) is migrating from node to node, but one can argue whether
that's necessarily a good thing (and depend on how homogeneous the nodes
are).

I googled a bit more on the subject, and it seems like it's just about
impossible to get an interactive shell on a compute node on the Beowulf.  I
wonder if that also means no interactive R session...

BTW, a couple of people mentioned setting DISPLAY.  Yes, I did do that (both
using IP and host name), but alas, no luck.

Best,
Andy




More information about the R-help mailing list