[R] applying cor.test to a (m, n) matrix - SUMMARY

Monica Pisica pisicandru at hotmail.com
Fri May 9 16:56:31 CEST 2008


Hi again,

I've got few very good options from the list and since they were not posted to the list, I will provide a summary. Thank you very much to all who answered and I hope this summary will benefit others interested in solving similar problems like that.

Yasir Kaheil re-wrote my original code in a more streamlined way:
pr<-array(0,c(dim(x)[2],dim(x)[2])); 
for (i in 1:dim(x)[2]) for (j in 1:dim(x)[2])
pr[i,j]<-cor.test(x[,i],x[,j])$p.val;

If column names are of importance they can be added like:
rownames(pr) <- colnames(x)
colnames(pr) <- colnames(x)


Jorge Ivan Velez sent 2 different solutions, and I am not quite sure which I like better. I think the first one is very nice if such a table needs to be published since the upper diagonal correspond to p values and lower diagonal to correlation values (this will satisfy any finicky editor who really, REALLY wants the p-values;-)). The second function is very good for looking in one sweep at the results of cor.test() applied to a matrix.

cor.pvalue <- function(X,method="pearson", use="complete" ) {
dfr = nrow(X) - 2
R <- cor(X,method=method, use= use)
above <- row(R) < col(R)
r2 <- R[above]^2
Fstat <- r2 * dfr / (1 - r2)
R[above] <- 1 - pf(Fstat, 1, dfr)
R
}

# Example
m <- matrix(rnorm(10*5), ncol=5); colnames(m)=paste("m",1:ncol(m),sep="")
cor.pvalue(m)
            m1          m2          m3        m4        m5
m1  1.00000000  0.24244042  0.93991694 0.5563147 0.9154130
m2  0.40750935  1.00000000  0.77016121 0.9745312 0.3762165
m3  0.02748730  0.10626089  1.00000000 0.3336439 0.8325926
m4 -0.21211652 -0.01164448 -0.34184398 1.0000000 0.2059275
m5 -0.03872636  0.31444921  0.07698371 0.4376249 1.0000000


correl.stats=function(X, method = "pearson", use = "complete" , conf.level = 0.95){
require(forward)
combs=t(fwd.combn(colnames(X), 2))
temp=t(apply(combs,1, function(x){
Y=X[,as.character(x)]
res=cor.test(Y[,1],Y[,2], use = use, method = method, conf.level = conf.level)
temp2=c(res$estimate, res$statistic, res$p.value, res$conf.int[1:2])
names(temp2)=c('rho','statistic','pvalue','lower','upper')
rm(res)
temp2
}
)
)
rownames(temp)=paste(combs[,1],combs[,2],sep="")
temp
}

# Example
correl.stats(m)

             rho   statistic    pvalue      lower     upper
m1m2  0.40750935  1.26216512 0.2424404 -0.2987766 0.8253647
m1m3  0.02748730  0.07777522 0.9399169 -0.6127436 0.6459346
m1m4 -0.21211652 -0.61392640 0.5563147 -0.7425696 0.4818648
m1m5 -0.03872636 -0.10961692 0.9154130 -0.6524440 0.6056680
m2m3  0.10626089  0.30226250 0.7701612 -0.5608917 0.6897404
m2m4 -0.01164448 -0.03293778 0.9745312 -0.6366034 0.6225461
m2m5  0.31444921  0.93692273 0.3762165 -0.3929818 0.7880525
m3m4 -0.34184398 -1.02886285 0.3336439 -0.7994101 0.3667110
m3m5  0.07698371  0.21839093 0.8325926 -0.5807942 0.6739433
m4m5  0.43762492  1.37661090 0.2059275 -0.2650270 0.8367053

Phil Spector did a wonderful job in providing the list of matrices with the values out of cor.test(). Strangely enough the correlation values of a column with itself is 0 instead of 1, although all the other correlations have correct values. And of course this changes the lower and upper limits to 0 instead of 1 as well, although the p-value is correct in this case. Sincerely I don't see why that is. I will provide his example as well with the same m matrix I used above:

mydat = m
n = dim(mydat)[2]
 
makemat = function(fun){
cc = combn(n,2)
results = apply(cc,2,function(x)cor.test(mydat[,x[1]],mydat[,x[2]]))
answer = matrix(0,n,n)
now = sapply(results,fun)
answer[t(cc)] = now
answer[t(cc)[,c(2,1)]] = now
answer
}
 
lapply(list(estimate=function(x)x[['estimate']],
pvalue = function(x)x[['p.value']],
lower = function(x)x[['conf.int']][1],
upper = function(x)x[['conf.int']][2]),makemat)
$estimate
            [,1]        [,2]        [,3]        [,4]        [,5]
[1,]  0.00000000  0.40750935  0.02748730 -0.21211652 -0.03872636
[2,]  0.40750935  0.00000000  0.10626089 -0.01164448  0.31444921
[3,]  0.02748730  0.10626089  0.00000000 -0.34184398  0.07698371
[4,] -0.21211652 -0.01164448 -0.34184398  0.00000000  0.43762492
[5,] -0.03872636  0.31444921  0.07698371  0.43762492  0.00000000

$pvalue
          [,1]      [,2]      [,3]      [,4]      [,5]
[1,] 0.0000000 0.2424404 0.9399169 0.5563147 0.9154130
[2,] 0.2424404 0.0000000 0.7701612 0.9745312 0.3762165
[3,] 0.9399169 0.7701612 0.0000000 0.3336439 0.8325926
[4,] 0.5563147 0.9745312 0.3336439 0.0000000 0.2059275
[5,] 0.9154130 0.3762165 0.8325926 0.2059275 0.0000000

$lower
           [,1]       [,2]       [,3]       [,4]       [,5]
[1,]  0.0000000 -0.2987766 -0.6127436 -0.7425696 -0.6524440
[2,] -0.2987766  0.0000000 -0.5608917 -0.6366034 -0.3929818
[3,] -0.6127436 -0.5608917  0.0000000 -0.7994101 -0.5807942
[4,] -0.7425696 -0.6366034 -0.7994101  0.0000000 -0.2650270
[5,] -0.6524440 -0.3929818 -0.5807942 -0.2650270  0.0000000

$upper
          [,1]      [,2]      [,3]      [,4]      [,5]
[1,] 0.0000000 0.8253647 0.6459346 0.4818648 0.6056680
[2,] 0.8253647 0.0000000 0.6897404 0.6225461 0.7880525
[3,] 0.6459346 0.6897404 0.0000000 0.3667110 0.6739433
[4,] 0.4818648 0.6225461 0.3667110 0.0000000 0.8367053
[5,] 0.6056680 0.7880525 0.6739433 0.8367053 0.0000000
 

Dimitris Rizopoulos suggested to look at rcor.test() from ltm, which I will do shortly.

Thank you again for all your thorough answers,

Monica



_________________________________________________________________


esh_messenger_052008


More information about the R-help mailing list