[BioC] Correlation works, but dist() runs out of memory

Daniel Brewer daniel.brewer at icr.ac.uk
Tue Mar 13 17:21:20 CET 2007


Apologies for the post.  It was just a typo on the Bioconductor version,
I meant 1.9.  I have found out my error though, basically I was trying
to cluster on samples, and I had not transposed the matrix before trying
to calculate the distance matrix.  It is strange that cor() calculates
between columns and dist() calculates between rows.

Thanks for the input anyway.

Daniel

Wolfgang Huber wrote:
> Dear Daniel,
> 
> Please read the posting guide that recommends that you give a
> reproducible example and the output of sessionInfo. Also, there is no
> such thing as Bioconductor 0.9.
> 
> 1) Are you sure you are giving it "only" a 22011 x 16 matrix? I get
> 
>> a=numeric(2^31-1)
> Error in vector("double", length) : cannot allocate vector of length
> 2147483647
> 
>> a=numeric(2^31)
> Error in vector("double", length) : vector size specified is too large
> 
> and of course 2^31 >> choose(22011,2).
> 
> 2) choose(22011,2)*8/1e6 = 1937.84 i.e. one copy of your distance matrix
> would need 2 GB RAM, and if you have other large stuff around or if it
> needs to be copied, your 3 GB RAM may not be enough. Rather than brute
> force, thinking about reducing the set of genes to an interesting subset
> before doing the clustering might help.
> 
>> sessionInfo()
> R version 2.5.0 Under development (unstable) (2007-03-13 r40832)
> i686-pc-linux-gnu
> 
> locale:
> LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=en_GB.UTF-8;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C
> 
> 
> attached base packages:
> [1] "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"
> [7] "base"
> 
> 
> Best wishes
>   Wolfgang
> 
> ------------------------------------------------------------------
> Wolfgang Huber  EBI/EMBL  Cambridge UK  http://www.ebi.ac.uk/huber
> 
> 
>> I am attempting to do plot a hierarchical clustering dendogram of a
>> reasonable modestly sized gene expression matrix of 22011 x 16.
>>
>> If I choose to use a correlation measure it works fine (
>> c2 <- cor(ExonExpr)
>> d2 <- as.dist(1-c2)
>> hier2 <- hclust(d2,method="average")
>> ).  If I try to create a Euclidean distance object it crashes out with a
>> memory error (
>>> Error in vector("double", length) : vector size specified is too large
>> ).
>>
>> This seems strange as I have 3GB ram, which I would think is plenty. Any
>> ideas what is going wrong or how to get round this.
>>
>>
>> Thanks
>>
>> Dan
>>
>> PS Running R 2.4.1, Bioconductor 0.9 on SUSE 10.2 Linux.
>>
> 
> 

-- 
**************************************************************

Daniel Brewer, Ph.D.

Institute of Cancer Research
Molecular Carcinogenesis
MUCRC
15 Cotswold Road
Sutton, Surrey SM2 5NG
United Kingdom

Tel: +44 (0) 20 8722 4109
Fax: +44 (0) 20 8722 4141

Email: daniel.brewer at icr.ac.uk

**************************************************************

The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the addre...{{dropped}}



More information about the Bioconductor mailing list