[BioC] Correlation works, but dist() runs out of memory

Sean Davis sdavis2 at mail.nih.gov
Tue Mar 13 17:14:32 CET 2007


On Tuesday 13 March 2007 11:34, Daniel Brewer wrote:
> I am attempting to do plot a hierarchical clustering dendogram of a
> reasonable modestly sized gene expression matrix of 22011 x 16.
>
> If I choose to use a correlation measure it works fine (
> c2 <- cor(ExonExpr)
> d2 <- as.dist(1-c2)
> hier2 <- hclust(d2,method="average")
> ).  If I try to create a Euclidean distance object it crashes out with a
> memory error (
>
> > Error in vector("double", length) : vector size specified is too large
>
> ).
>
> This seems strange as I have 3GB ram, which I would think is plenty. Any
> ideas what is going wrong or how to get round this.

Hi, Dan.

You probably want to do the dist() on the transposed matrix.  

> a <- matrix(rnorm(20000),nc=10) # a 2000 x 10 matrix
> b <- dist(a)
> dim(as.matrix(b))
[1] 2000 2000
> d <- cor(a)
> dim(d)
[1] 10 10

Note the difference in sizes of the matrices.

Sean



More information about the Bioconductor mailing list