[R] Very large matrices for very large genome

Duncan Murdoch dmurdoch at pair.com
Mon Apr 12 04:47:33 CEST 2004

On Sun, 11 Apr 2004 19:15:08 -0700 (PDT), you wrote:

>I am using R to look at whole-genome gene expression data. This means
>about 27,000 genes, each with a vector of numbers reflecting expression at
>different tissues and times.

How long is that vector?  Presumably shorter than 27000.

>I need to do an all against all co-expression
>calculation (basically, just calculate Pearson's r for every gene-gene
>pair). I try to store the result of such a thing in a 27000x27000 matrix,
>but r seems not to like allocating such a large beast. Any

If you have fewer than 27000 cases, then the correlation matrix is not
full rank, and could be summarized in much less space.  For example,
if you have 100 cases, then a 100x100 matrix will give the correlation
structure, and a 26900x100 matrix would give the weights for the rest
of the genes.

(It's late, so I might wrong about this, but I don't think so.)

To calculate those matrices, just pick the first 100 genes to use for
the correlation matrix (assuming you get a full rank matrix that way),
then regress each of the others onto those.

Duncan Murdoch

More information about the R-help mailing list