[R] processing a large matrix

Greg Snow Greg.Snow at intermountainmail.org
Mon Feb 12 22:42:47 CET 2007


Given the response by Carles Berry, you should probably really think
about what you want to do with the results (I'm hoping that you do not
plan to look at every R^2 value personally).  For instance if you want
to find which variable gives the highest R^2 value for each variable,
then this approach may work better:

myR2fun <- function(i){
  cat("\r",i)     # optional
  flush.console() # optional
 tmp <- cor( mymat[,i], mymat[,-i] )^2
 which.max(tmp)
}

out <- sapply( 1:30000, myR2fun )



-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at intermountainmail.org
(801) 408-8111
 
 

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Greg Snow
> Sent: Monday, February 12, 2007 2:34 PM
> To: andy1983; r-help at stat.math.ethz.ch
> Subject: Re: [R] processing a large matrix
> 
> One approach is to split up the work of doing the 
> correlations, if you give the 'cor' function 2 matricies then 
> it gives you the correlations between all pairs of columns.  
> Since you said it works fine with 10,000 columns but not 
> 30,000 you could split into 3 pieces and do something like (untested):
> 
>  out <- rbind(  
> 	cbind( cor(mymatrix[,1:10000])^2, 
>             cor(mymatrix[,1:10000], mymatrix[10001:20000])^2, 
>             cor(mymatrix[,1:10000], mymatrix[20001:30000])^2 ),
>      cbind( matrix(NA,10000,10000),
>             cor(mymatrix[,10001:20000])^2,
>             cor(mymatrix[,20001:30000],mymatrix[,1:10000])^2),
>      cbind( matrix(NA,10000,10000),
>             matrix(NA,10000,10000),
>             cor(mymatrix[,20001:30000])^2 )
>      )
> 
> out[ lower.tri(out) ] <- t(out)[ lower.tri(out) ]
> 
> For breaking into 3 pieces, this is probably easier/quicker 
> than trying to find and alternative.  If you need to break it 
> into even more pieces (doing blocks of 1,000 when there are 
> 30,000 columns) then there are probably better alternatives 
> (you could do a loop over blocks, that would be faster than 
> the loop over individual columns).
> 
> Hope this helps,
> 
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.snow at intermountainmail.org
> (801) 408-8111
>  
>  
> 
> > -----Original Message-----
> > From: r-help-bounces at stat.math.ethz.ch 
> > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of andy1983
> > Sent: Monday, February 12, 2007 1:55 PM
> > To: r-help at stat.math.ethz.ch
> > Subject: [R] processing a large matrix
> > 
> > 
> > I would like to compare every column in my matrix with every other 
> > column and get the r-squared.
> > 
> > I tried using the following formula and looping through 
> every column:
> > > summary(lm(matrix[,x]~matrix[,y]))$r.squared
> > If I have 10,000 columns, the loops (10,000 * 10,000) take forever 
> > even if there is no formula inside.
> > 
> > Then, I attempted to vectorize my code:
> > > cor(matrix)^2
> > With 10,000 columns, this works great. With 30,000, R tells me it 
> > cannot allocate vector of that length even if the memory 
> limit is set 
> > to 4 GBs.
> > 
> > Is there anything else I can do to resolve this issue?
> > 
> > Thanks.
> > --
> > View this message in context: 
> > http://www.nabble.com/processing-a-large-matrix-tf3216447.html
> #a8932591
> > Sent from the R help mailing list archive at Nabble.com.
> > 
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list