[R] Correlation of huge matrix saved as binary file

Bryo brynedal at gmail.com
Fri Mar 2 23:50:36 CET 2012


Hi,

I have a 900,000,000*9,000 matrix where I need to calculate the correlation
between all entries along the smaller dimension, thus creating a 9k*9k
correlation matrix. This matrix is too big to be uploaded in R, and is saved
as a binary file. To access the data in the file I use mmap and some
api-functions (to get all values in one row, one column, or one particular
value). I'm looking for some advice in how to calculate the correlation
matrix. Right now my approach is to do something similar to this (toy code):

corr.matrix<-matrix('numeric',ncol=9000,nrow=9000)

for (i in 1:9000) {
for (j in (i+1):9000) {
# i1=... getting the index of  item (i) in a second file
# i2=....getting the index of item (j)
g1=api$getCol(i1)
g2=api$getCol(i2)
cor.matrix[i,j]=cor(g1,g2)
}}

This will work, but will take forever. Any advice for how this can be done
more efficiently? I'm running on a 2.6.18 linux system, with R version
R-2.11.1.

Thanks!


--
View this message in context: http://r.789695.n4.nabble.com/Correlation-of-huge-matrix-saved-as-binary-file-tp4440119p4440119.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list