[R] processing a large matrix

Greg Snow Greg.Snow at intermountainmail.org
Mon Feb 12 22:34:16 CET 2007


One approach is to split up the work of doing the correlations, if you
give the 'cor' function 2 matricies then it gives you the correlations
between all pairs of columns.  Since you said it works fine with 10,000
columns but not 30,000 you could split into 3 pieces and do something
like (untested):

 out <- rbind(  
	cbind( cor(mymatrix[,1:10000])^2, 
            cor(mymatrix[,1:10000], mymatrix[10001:20000])^2, 
            cor(mymatrix[,1:10000], mymatrix[20001:30000])^2 ),
     cbind( matrix(NA,10000,10000),
            cor(mymatrix[,10001:20000])^2,
            cor(mymatrix[,20001:30000],mymatrix[,1:10000])^2),
     cbind( matrix(NA,10000,10000),
            matrix(NA,10000,10000),
            cor(mymatrix[,20001:30000])^2 )
     )

out[ lower.tri(out) ] <- t(out)[ lower.tri(out) ]

For breaking into 3 pieces, this is probably easier/quicker than trying
to find and alternative.  If you need to break it into even more pieces
(doing blocks of 1,000 when there are 30,000 columns) then there are
probably better alternatives (you could do a loop over blocks, that
would be faster than the loop over individual columns).

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at intermountainmail.org
(801) 408-8111
 
 

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of andy1983
> Sent: Monday, February 12, 2007 1:55 PM
> To: r-help at stat.math.ethz.ch
> Subject: [R] processing a large matrix
> 
> 
> I would like to compare every column in my matrix with every 
> other column and get the r-squared.
> 
> I tried using the following formula and looping through every column:
> > summary(lm(matrix[,x]~matrix[,y]))$r.squared
> If I have 10,000 columns, the loops (10,000 * 10,000) take 
> forever even if there is no formula inside.
> 
> Then, I attempted to vectorize my code:
> > cor(matrix)^2
> With 10,000 columns, this works great. With 30,000, R tells 
> me it cannot allocate vector of that length even if the 
> memory limit is set to 4 GBs.
> 
> Is there anything else I can do to resolve this issue?
> 
> Thanks.
> --
> View this message in context: 
> http://www.nabble.com/processing-a-large-matrix-tf3216447.html
#a8932591
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list