[R] matrix of size 30^5

Charles Berry ccberry at ucsd.edu
Sun Apr 21 18:21:36 CEST 2013


Benjamin Caldwell <btcaldwell <at> berkeley.edu> writes:

> 
> Dear R helpers
> 
> Reproducible example:
> 
> #warning - this causes a hard freeze on the machines I've tried it on
> matrix.holder<- matrix(rnorm(150), nrow=30, ncol=5)
> 
> Out=
> expand.grid(matrix.holder[,1],matrix.holder[,2],matrix.holder[,3],
matrix.holder[,4],
> matrix.holder[,5])
> 
> Problem:
> 
> I'm running an analysis that I would like to do using a matrix containing
> all the possible combinations of the elements in a [30,5] matrix. Briefly,
> each possible combination is used to index and subset another matrix. I
> then run some models on the data in the subsetted matrix and then 
> sometimes
> export the model results based on a couple criteria. 24,300,000
> combinations seems to be too big for R on my computer (Intel i5, about 2.5
> GB RAM free, 4 GB total, Rx64 2.15 ) to handle.
> 
> Requests:
> 
[snip]



> I'd like to attempt to multithread [snip]


Ben,

The problem you have is "embarassingly parallel" - as they say.

You can effectively use brute force solutions to parallelize the job
and do it with subjobs that have smaller memory requirements.

One way to parallelize the problem is to create the object 'matrix.holder',
then loop thru the values of matrix.holder[,1] and create a subjob that 
will run all the computations for matrix.holder[i,1] and all the 
combinations of matrix.holder[,-1]. Run the subjob in a new process and save
the results. Later on you combine the saved results. 

Also, you could try to run each subjob using parallel::mclapply() or 
some other parallelizing package. Or you could loop over each of the 
first two columns of matrix.holder creating 900 subjobs. Also, this gives 
you still smaller memory requirements for the individual jobs.

HTH,



More information about the R-help mailing list