[R] Parallel R

Juan Pablo Romero Méndez jpablo.romero at gmail.com
Mon Jun 30 07:18:07 CEST 2008


Thanks!

It turned out that Rmpi was a good option for this problem after all.

Nevetheless, pnmath seems very promising, although it doesn't load in my system:


> library(pnmath)
Error in dyn.load(file, DLLpath = DLLpath, ...) :
  unable to load shared library
'/home/jpablo/extra/R-271/lib/R/library/pnmath/libs/pnmath.so':
  libgomp.so.1: shared object cannot be dlopen()ed
Error: package/namespace load failed for 'pnmath'


I find it odd, because  libgomp.so.1 is in /usr/lib, so R should find it.


  Juan Pablo


On Sun, Jun 29, 2008 at 1:36 AM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
> "Juan Pablo Romero Méndez" <jpablo.romero at gmail.com> writes:
>
>> Hello,
>>
>> The problem I'm working now requires to operate on big matrices.
>>
>> I've noticed that there are some packages that allows to run some
>> commands in parallel. I've tried snow and NetWorkSpaces, without much
>> success (they are far more slower that the normal functions)
>
> Do you mean like this?
>
>> library(Rmpi)
>> mpi.spawn.Rslaves(nsl=2) # dual core on my laptop
>> m <- matrix(0, 10000, 1000)
>> system.time(x1 <- apply(m, 2, sum), gcFirst=TRUE)
>   user  system elapsed
>  0.644   0.148   1.017
>> system.time(x2 <- mpi.parApply(m, 2, sum), gcFirst=TRUE)
>   user  system elapsed
>  5.188   2.844  10.693
>
> ? (This is with Rmpi, a third alternative you did not mention;
> 'elapsed' time seems to be relevant here.)
>
> The basic problem is that the overhead of dividing the matrix up and
> communicating between processes outweighs the already-efficient
> computation being performed.
>
> One solution is to organize your code into 'coarse' grains, so the FUN
> in apply does (considerably) more work.
>
> A second approach is to develop a better algorithm / use an
> appropriate R paradigm, e.g.,
>
>> system.time(x3 <- colSums(m), gcFirst=TRUE)
>   user  system elapsed
>  0.060   0.000   0.088
>
> (or even faster, x4 <- rep(0, ncol(m)) ;)
>
> A third approach, if your calculations make heavy use of linear
> algebra, is to build R with a vectorized BLAS library; see the R
> Installation and Administration guide.
>
> A fourth possibility is to use Tierney's 'pnmath' library mentioned in
> this thread
>
> https://stat.ethz.ch/pipermail/r-help/2007-December/148756.html
>
> The README file needs to be consulted for the not-exactly-trivial (on
> my system) task of installing the package. Specific functions are
> parallelized, provided the length of the calculation makes it seem
> worth-while.
>
>> system.time(exp(m), gcFirst=TRUE)
>   user  system elapsed
>  0.108   0.000   0.106
>> library(pnmath)
>> system.time(exp(m), gcFirst=TRUE)
>   user  system elapsed
>  0.096   0.004   0.052
>
> (elapsed time about 2x faster). Both BLAS and pnmath make much better
> use of resources, since they do not require multiple R instances.
>
> None of these approaches would make a colSums faster -- the work is
> just too small for the overhead.
>
> Martin
>
>> My problem is very simple, it doesn't require any communication
>> between parallel tasks; only that it divides simetricaly the task
>> between the available cores. Also, I don't want to run the code in a
>> cluster, just my multicore machine (4 cores).
>>
>> What solution would you propose, given your experience?
>>
>> Regards,
>>
>>   Juan Pablo
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> --
> Martin Morgan
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M2 B169
> Phone: (206) 667-2793
>



More information about the R-help mailing list