[R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices

Daren Tan daren76 at hotmail.com
Wed Nov 26 16:37:55 CET 2008


Out of desperation, I made the following function which hadley beats me to it :P. Thanks everyone for the great help. 
 

cor.p.values <- function(r, n) {
  df <- n - 2
  STATISTIC <- c(sqrt(df) * r / sqrt(1 - r^2))
  p <- pt(STATISTIC, df)
  return(2 * pmin(p, 1 - p))
}

> Date: Wed, 26 Nov 2008 09:33:59 -0600
> From: h.wickham at gmail.com
> To: jholtman at gmail.com
> Subject: Re: [R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices
> CC: daren76 at hotmail.com; r-help at stat.math.ethz.ch
> 
> On Wed, Nov 26, 2008 at 8:14 AM, jim holtman  wrote:
>> Your time is being taken up in cor.test because you are calling it
>> 100,000 times. So grin and bear it with the amount of work you are
>> asking it to do.
>>
>> Here I am only calling it 100 time:
>>
>>> m1 <- matrix(rnorm(10000), ncol=100)
>>> m2 <- matrix(rnorm(10000), ncol=100)
>>> Rprof('/tempxx.txt')
>>> system.time(cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor.test(x,y)$p.value }) }))
>> user system elapsed
>> 8.86 0.00 8.89
>>>
>>
>> so my guess is that calling it 100,000 times will take: 100,000 *
>> 0.0886 seconds or about 3 hours.
> 
> You can make it ~3 times faster by vectorising the testing:
> 
> m1 <- matrix(rnorm(10000), ncol=100)
> m2 <- matrix(rnorm(10000), ncol=100)
> 
> system.time(cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1,
> function(y) { cor.test(x,y)$p.value })}))
> 
> 
> system.time({
> r <- apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor(x,y) })})
> 
> df <- nrow(m1) - 2
> t <- sqrt(df) * r / sqrt(1 - r ^ 2)
> p <- pt(t, df)
> p <- 2 * pmin(p, 1 - p)
> })
> 
> 
> all.equal(cor.pvalues, p)
> 
> 
> You can make cor much faster by stripping away all the error checking
> code and calling the internal c function directly (suggested by the
> Rprof output):
> 
> 
> system.time({
> r <- apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor(x,y) })})
> })
> 
> system.time({
> r2 <- apply(m1, 1, function(x) { apply(m2, 1, function(y) {
> .Internal(cor(x, y, 4L, FALSE)) })})
> })
> 
> 1.5s vs 0.2 s on my computer. Combining both changes gives me a ~25
> time speed up - I suspect you can do even better if you think about
> what calculations are being duplicated in the computation of the
> correlations.
> 
> Hadley
> 
> -- 
> http://had.co.nz/
_________________________________________________________________
[[elided Hotmail spam]]



More information about the R-help mailing list