[R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices

David Winsemius dwinsemius at comcast.net
Wed Nov 26 16:08:56 CET 2008


He might try rcorr from Hmisc instead. Using your test suite, it gives  
about a 20% improvement on my MacPro:

 > m1 <- matrix(rnorm(10000), ncol=100)
 > m2 <- matrix(rnorm(10000), ncol=100)
 > Rprof('/tempxx.txt')
 > system.time(cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1,  
function(y) { rcorr(x,y)$P }) }))
    user  system elapsed
   4.221   0.049   4.289

 > m1 <- matrix(rnorm(10000), ncol=100)
 > m2 <- matrix(rnorm(10000), ncol=100)
 > Rprof('/tempxx.txt')
 > system.time(cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1,  
function(y) { cor.test(x,y)$p.value }) }))
    user  system elapsed
   5.328   0.038   5.355

I'm not a smart enough programmer to figure out whether there might be  
an even more efficient method that takes advantage rcorr's  implicit  
"looping" through a set of columns to produce an all combinations  
return.

-- 
David Winsemius, MD
Heritage Labs


On Nov 26, 2008, at 9:14 AM, jim holtman wrote:

> Your time is being taken up in cor.test because you are calling it
> 100,000 times.  So grin and bear it with the amount of work you are
> asking it to do.
>
> Here I am only calling it 100 time:
>
>> m1 <- matrix(rnorm(10000), ncol=100)
>> m2 <- matrix(rnorm(10000), ncol=100)
>> Rprof('/tempxx.txt')
>> system.time(cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1,  
>> function(y) { cor.test(x,y)$p.value }) }))
>   user  system elapsed
>   8.86    0.00    8.89
>>
>
> so my guess is that calling it 100,000 times will take:  100,000 *
> 0.0886 seconds or about 3 hours.
>
> If you run Rprof, you will see if is spending most of its time there:
>
>  0   8.8 root
>  1.    8.8 apply
>  2. .    8.8 FUN
>  3. . .    8.8 apply
>  4. . . .    8.7 FUN
>  5. . . . .    8.6 cor.test
>  6. . . . . .    8.4 cor.test.default
>  7. . . . . . .    2.4 match.arg
>  8. . . . . . . .    1.7 eval
>  9. . . . . . . . .    1.4 deparse
> 10. . . . . . . . . .    0.6 .deparseOpts
> 11. . . . . . . . . . .    0.2 pmatch
> 11. . . . . . . . . . .    0.1 sum
> 10. . . . . . . . . .    0.5 %in%
> 11. . . . . . . . . . .    0.3 match
> 12. . . . . . . . . . . .    0.3 is.factor
> 13. . . . . . . . . . . . .    0.3 inherits
>  8. . . . . . . .    0.2 formals
>  9. . . . . . . . .    0.2 sys.function
>  7. . . . . . .    2.1 cor
>  8. . . . . . . .    1.1 match.arg
>  9. . . . . . . . .    0.7 eval
> 10. . . . . . . . . .    0.6 deparse
> 11. . . . . . . . . . .    0.3 .deparseOpts
> 12. . . . . . . . . . . .    0.1 pmatch
> 11. . . . . . . . . . .    0.2 %in%
> 12. . . . . . . . . . . .    0.2 match
> 13. . . . . . . . . . . . .    0.1 is.factor
> 14. . . . . . . . . . . . . .    0.1 inherits
>  9. . . . . . . . .    0.1 formals
>  8. . . . . . . .    0.5 stopifnot
>  9. . . . . . . . .    0.2 match.call
>  8. . . . . . . .    0.1 pmatch
>  8. . . . . . . .    0.1 is.data.frame
>  9. . . . . . . . .    0.1 inherits
>  7. . . . . . .    1.5 paste
>  8. . . . . . . .    1.4 deparse
>  9. . . . . . . . .    0.6 .deparseOpts
> 10. . . . . . . . . .    0.3 pmatch
> 10. . . . . . . . . .    0.1 any
>  9. . . . . . . . .    0.6 %in%
> 10. . . . . . . . . .    0.6 match
> 11. . . . . . . . . . .    0.5 is.factor
> 12. . . . . . . . . . . .    0.4 inherits
> 13. . . . . . . . . . . . .    0.2 mode
>  7. . . . . . .    0.4 switch
>  8. . . . . . . .    0.1 qnorm
>  7. . . . . . .    0.2 pt
>  5. . . . .    0.1 $
>
> On Tue, Nov 25, 2008 at 11:55 PM, Daren Tan <daren76 at hotmail.com>  
> wrote:
>>
>> My two matrices are roughly the sizes of m1 and m2. I tried using  
>> two apply and cor.test to compute the correlation p.values. More  
>> than an hour, and the codes are still running. Please help to make  
>> it more efficient.
>>
>> m1 <- matrix(rnorm(100000), ncol=100)
>> m2 <- matrix(rnorm(10000000), ncol=100)
>>
>> cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1, function(y)  
>> { cor.test(x,y)$p.value }) })
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> -- 
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list