[R] more efficient way to parallel

Martin Morgan mtmorgan at fhcrc.org
Mon Aug 6 18:50:50 CEST 2012


On 08/06/2012 09:41 AM, Jie wrote:
> After searching online, I found that clusterCall or foreach might be the
> solution.

Re-write your outer loop as an lapply, then on non-Windows use 
parallel::mclapply. Or on windows use makePSOCKcluster and parLapply. I 
ended with

library(parallel)
library(MASS)
Maxi <- 10
Maxj <- 1000

doit <- function(i, Maxi, Maxj)
{
     ## initialization, not of interest
     Sigmahalf <- matrix(sample(10000, replace=TRUE),  100)
     Sigma <- t(Sigmahalf) %*% Sigmahalf
     x <- mvrnorm(n=Maxj, rep(0, 100), Sigma)
     xlist <- lapply(seq_len(nrow(x)), function(i, x) matrix(x[i,], 10), x)
     ## end of initialization

     fun <- function(x) {
         v <- eigen(x, symmetric=FALSE, only.values=TRUE)$values
         min(abs(v))
     }
     dd1 <- sapply(xlist, fun)
     dd2 <- dd1 + dd1 / sum(dd1)
     sum(dd1 * dd2)
}

 > system.time(lapply(1:8, doit, Maxi, Maxj))
    user  system elapsed
   6.677   0.016   6.714
 > system.time(mclapply(1:64, doit, Maxi, Maxj, mc.cores=8))
    user  system elapsed
  68.857   1.032  10.398

the extra arguments to eigen are important, as is avoiding unnecessary 
repeated calculations. The strategy of allocate-and-grow 
(result.vec=numeric(); result.vec[i] <- ...) is very inefficient 
(result.vec is copied in its entirety for each new value of i); better 
preallocate-and-fill (result.vec = integer(Maxi); result.vec[i] = ...) 
or let lapply manage the allocation.

Martin

>
> Best wishes,
> Jie
>
> On Sun, Aug 5, 2012 at 10:23 PM, Jie <jimmycloud at gmail.com> wrote:
>
>> Dear All,
>>
>> Suppose I have a program as below: Outside is a loop for simulation (with
>> random generated data), inside there are several sapply()'s (10~100) over
>> the data and something else, but these sapply's have to be sequential. And
>> each sapply do not involve very intensive calculation (a few seconds only).
>> So the outside loop takes minutes to finish one iteration.
>> I guess the better way is not to parallel sapply but the outer loop.
>> But I have no idea how to modify it. I have a simple code here. Only two
>> sapply's involved for simplicity. The logical in the sapply is not
>>   important.
>> Thank you for your attention and suggestion.
>>
>> library(parallel)
>> library(MASS)
>> result.seq=c()
>> Maxi <- 100
>> for (i in 1:Maxi)
>> {
>> ## initialization, not of interest
>> Sigmahalf <- matrix(sample(1:10000,size = 10000,replace =T ),  100)
>> Sigma <- t(Sigmahalf)%*%Sigmahalf
>> x <- mvrnorm(n=1000, rep(0, 10), Sigma)
>> xlist <- list()
>> for (j in 1:1000)
>> {
>> xlist[[j]] <- list(X = matrix( x [j, ],5))
>> }
>> ## end of initialization
>>
>> dd1 <- sapply(xlist,function(s) {min(abs((eigen(s$X))$values))})
>>   ##
>> sumdd1=sum(dd1)
>> for (j in 1:1000)
>> {
>> xlist[[j]]$dd1 <- dd1[j]/sumdd1
>> }
>>    ## Assume dd2 and dd1 can not be combined in one sapply()
>> dd2 <- sapply(xlist, function(s){min(abs((eigen(s$X))$values))+s$dd1})
>> result.seq[i] <- sum(dd1*dd2)
>>
>> }
>>
>>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the R-help mailing list