[Rd] Distributed computing

Fei Chen feic at stats.ox.ac.uk
Wed Mar 24 17:46:08 CET 2004


Thanks Brian for pointing this out...

Yes indeed my thesis involved distributed computing and R. It consisted of
two parts, a distributed scoping feature for limiting data movements, and
a parrallel computing interface for speeding up computations. The former
used CORBA and the latter PVM (plus embedded R-s and ScaLAPACK).

There are three documents available describing this in more detail

http://www.stats.ox.ac.uk/~feic/Rs/thesis.pdf
my thesis

http://www.stats.ox.ac.uk/~feic/Rs/shorter.pdf
a shorter summary

http://www.stats.ox.ac.uk/~feic/Rs/DSC2003.pdf
the DSC document Brian pointed out.

I haven't publicized this mainly because the distributed scoping piece
involved modifying internal R code, most notably the R_eval() function,
which is a bit non-portable... But if there's interest in how I did things
I can certainly clean up my code and make it available. The parallel
engine part uses standard R so it should be easier to set up.

Cheers,

fei





On Wed, 24 Mar 2004, Prof Brian Ripley wrote:

> Fei Chen implemented distribution of data and ScaLAPACK as part of his
> DPhil thesis, with a high-level R interface.  Moving data around is often
> the major limiting factor on large-scale model fitting (he was
> experimenting with glm's).
>
> There are two brief papers at
>
> http://www.isi-2003.de/guest/3427.pdf?MItabObj=pcoabstract&MIcolObj=uploadpaper&MInamObj=id&MIvalObj=3427&MItypeObj=application/pdf
>
> adn in the DSC2003 proceedings  (but the ci.tuwien server is currently not
> available, at least from here).
>
> Now Fei's process is complete, perhaps he will make the thesis available
> on line.
>
>
> On Tue, 23 Mar 2004 gte810u at mail.gatech.edu wrote:
>
> Quoting someone unamed! --
>
> > > My inclination would be to, whenever possible, replace the core scalar
> > > libraries with compatible parallel versions (lapack -> scalapack),
> > > rather than make it an add-on package. If the R client code is general
> > > enough, and the make file can automatically find the parallel version,
> > > then its a simple matter of compiling with the parallel libs. (Don't
> > > know if this is possible at run-time.) No rewriting (high level) R code
> > > at all. I tried to contact the plapack folks here at UT about
> > > integrating with R, but it appears the project is no longer active.
> >
> > Unfortunately, there is a major complication to this approach:  the distribution
> > of data.  ScaLAPACK (and PLAPACK) requires the data to be distributed in a
> > special way before calculation functions can be called.  Given a generic R
> > matrix, we have to distribute the data before we can call ScaLAPACK functions on
> > it.  We then have to collect the answer before we can return it to R.  Because
> > of this serious overhead, replacing all LAPACK calls with ScaLAPACK calls would
> > not be recommended.  Future versions of our package [1] may include some type of
> > automatic benchmarking to decide when problems are large enough to be worth
> > sending to ScaLAPACK.
> >
> >
> > David Bauer
> >
> > [1] http://www.aspect-sdm.org/Parallel-R/
> >
> > ______________________________________________
> > R-devel at stat.math.ethz.ch mailing list
> > https://www.stat.math.ethz.ch/mailman/listinfo/r-devel
> >
> >
>
> --
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>



More information about the R-devel mailing list