[R] Parallel R

Luke Tierney luke at stat.uiowa.edu
Mon Jun 30 16:48:25 CEST 2008

On Mon, 30 Jun 2008, Juan Pablo Romero Méndez wrote:

> Thanks!
> It turned out that Rmpi was a good option for this problem after all.

To help with improving snow I'd be interested to hear more bout why
Rmpi works for you but snow did not.

> Nevetheless, pnmath seems very promising, although it doesn't load in my system:
>> library(pnmath)
> Error in dyn.load(file, DLLpath = DLLpath, ...) :
>  unable to load shared library
> '/home/jpablo/extra/R-271/lib/R/library/pnmath/libs/pnmath.so':
>  libgomp.so.1: shared object cannot be dlopen()ed
> Error: package/namespace load failed for 'pnmath'
> I find it odd, because  libgomp.so.1 is in /usr/lib, so R should find it.

Could you tell us the OS and gcc version you are using? We are
starting to look at folding this into base R and this may help with
figuring out configuration issues.

The error probably means what it says: libgomp.so is found but can't
be used with dlopen.  Early versions of libgomb included an
"optimization" that meant libgomp.so could only be used if it was
linked at compile time and so got loaded by the shared library manager
at program startup.  It could not be loaded at runtime with dlopen.
This has resulted in a number of complaints because it makes libgomp
unusable in embedded settings. Many Linux distributions seem to have
patched this, including current Fedora and RHEL, but I suspect it will
continue to arise from time to time.  If you build R from source you
can work around this by linking R with -lgomp (doesn't help with
embedded uses of R though). Here are some relevant threads on this
that brian Ripley tracked down a while back:


Another option for trying out the current parallel nmath code is
pnmath0 available from the same place as pnmath.  This uses raw
pthreads rather than Open MP.



>  Juan Pablo
> On Sun, Jun 29, 2008 at 1:36 AM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
>> "Juan Pablo Romero Méndez" <jpablo.romero at gmail.com> writes:
>>> Hello,
>>> The problem I'm working now requires to operate on big matrices.
>>> I've noticed that there are some packages that allows to run some
>>> commands in parallel. I've tried snow and NetWorkSpaces, without much
>>> success (they are far more slower that the normal functions)
>> Do you mean like this?
>>> library(Rmpi)
>>> mpi.spawn.Rslaves(nsl=2) # dual core on my laptop
>>> m <- matrix(0, 10000, 1000)
>>> system.time(x1 <- apply(m, 2, sum), gcFirst=TRUE)
>>   user  system elapsed
>>  0.644   0.148   1.017
>>> system.time(x2 <- mpi.parApply(m, 2, sum), gcFirst=TRUE)
>>   user  system elapsed
>>  5.188   2.844  10.693
>> ? (This is with Rmpi, a third alternative you did not mention;
>> 'elapsed' time seems to be relevant here.)
>> The basic problem is that the overhead of dividing the matrix up and
>> communicating between processes outweighs the already-efficient
>> computation being performed.
>> One solution is to organize your code into 'coarse' grains, so the FUN
>> in apply does (considerably) more work.
>> A second approach is to develop a better algorithm / use an
>> appropriate R paradigm, e.g.,
>>> system.time(x3 <- colSums(m), gcFirst=TRUE)
>>   user  system elapsed
>>  0.060   0.000   0.088
>> (or even faster, x4 <- rep(0, ncol(m)) ;)
>> A third approach, if your calculations make heavy use of linear
>> algebra, is to build R with a vectorized BLAS library; see the R
>> Installation and Administration guide.
>> A fourth possibility is to use Tierney's 'pnmath' library mentioned in
>> this thread
>> https://stat.ethz.ch/pipermail/r-help/2007-December/148756.html
>> The README file needs to be consulted for the not-exactly-trivial (on
>> my system) task of installing the package. Specific functions are
>> parallelized, provided the length of the calculation makes it seem
>> worth-while.
>>> system.time(exp(m), gcFirst=TRUE)
>>   user  system elapsed
>>  0.108   0.000   0.106
>>> library(pnmath)
>>> system.time(exp(m), gcFirst=TRUE)
>>   user  system elapsed
>>  0.096   0.004   0.052
>> (elapsed time about 2x faster). Both BLAS and pnmath make much better
>> use of resources, since they do not require multiple R instances.
>> None of these approaches would make a colSums faster -- the work is
>> just too small for the overhead.
>> Martin
>>> My problem is very simple, it doesn't require any communication
>>> between parallel tasks; only that it divides simetricaly the task
>>> between the available cores. Also, I don't want to run the code in a
>>> cluster, just my multicore machine (4 cores).
>>> What solution would you propose, given your experience?
>>> Regards,
>>>   Juan Pablo
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> --
>> Martin Morgan
>> Computational Biology / Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N.
>> PO Box 19024 Seattle, WA 98109
>> Location: Arnold Building M2 B169
>> Phone: (206) 667-2793
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:      luke at stat.uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

More information about the R-help mailing list