[R] correlation matrix - large dataset

Douglas Bates bates at stat.wisc.edu
Tue Jan 8 16:07:25 CET 2008


On Jan 8, 2008 12:34 AM, suman Duvvuru <duvvuru.suman at gmail.com> wrote:
> Hello,

> I have a dataset with 20,000 variables.and I would like to compute a pearson
> correlation matrix which will be 20000*20000. The cor() function doesnt work
> in this case due to memory problem. If you have any ideas regarding a
> feasible way to compute correlations on such a huge dataset, please help me
> out.

Considering that a single copy of such a matrix, stored as a dense
matrix, is over 1 Gb

> 20000^2 * 8 / (2^20)
[1] 3051.8

I'm not surprised that you run into memory problems.

Perhaps it is time to look at the forest instead of the trees.  What
would you do with such a matrix if you were able to calculate and
store it?

> Please feel free to share your memory handling techniques in R.
>
> Thanks,
> Suman
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list