[BioC] Re: [S] Error in clustering procedure

David K Pritchard dpritch at u.washington.edu
Wed Sep 8 00:41:04 CEST 2004


Christian,
    I think it is overstating the matter to say it is useless to hierarchically cluster 7000 genes.  In most studies where one is comparing only a two or a few different conditions there is generally not alot of structure in the data and clustering is not useful.  However, I have been involved with rare experiments where there is alot of structure in the data and clustering the whole dataset (10 or 20K genes) is useful to see that structure.  I am presently analyzing an experiment where overexpression of a gene is compared to overexpression of a number of mutant forms of the gene.  In this study hierarchically clustering the data (20K genes) revealed structure in the data that would have been hard to see otherwise.
     Clearly there is no good way to look at all of this data at one time - however, programs like MEV from TIGR do a good job of presenting a useful interface for browsing that much data.  I also believe that MEV will hierarchically cluster ~20K genes and is freely available from the TIGR website.

David Pritchard


On Tue, 7 Sep 2004, cstrato wrote:

> Sorry, but I cannot resist:
> 
> Any comments of the microarry community on the usefulness of
> hierarchical clustering of 7000 genes?
> 
> Best regards
> Christian
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> C.h.r.i.s.t.i.a.n. .S.t.r.a.t.o.w.a
> V.i.e.n.n.a.         .A.u.s.t.r.i.a
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> 
> Prof Brian Ripley wrote:
> 
> > A distance matrix on 7000 objects alone takes up 187Mb.  I don't know how 
> > your machine is set up re swap space, but you should use your task manager 
> > to monitor memory usage.  Almost certainly you are running out of memory.
> > 
> > However, I have never seen an agglomerative clustering of 7000 objects
> > make sense scientifically (not that that stops the bioinformatics people).  
> > I think you need either to work in smaller subsets or to combine objects
> > into clusters before starting.
> > 
> > On Tue, 7 Sep 2004, Joao Baptista de O. e Souza Filho wrote:
> > 
> > 
> >>I am working with SPLUS 2000 using Windows 2000 SP4, 512 MBytes RAM,
> >>3 GBytes of free space in HD.
> >>
> >>When I try to do an aglomerative clustering upon a matriz of
> >>dimensions 7000 x 5, the program, after some time spent in
> >>calculations, returns the following error message:
> >>
> >>============================================================================================================================
> >>Error in disv == -1: Unable to obtain requested dynamic memory (this 
> >>request is for 200194252 bytes, 0 bytes already in use)
> >>============================================================================================================================
> >>
> >>First, I have used the command: "options(object.size=300e6)", since the
> >>program presented the messsage:
> >>
> >>=================================================================================================================================
> >>Error in double(1 + (n * (n - 1))/2): Cannot allocate 200194208 bytes: 
> >>options("object.size") is 100000000: see options help file
> >>=================================================================================================================================
> >>
> >>Does someone know how should I proceed?
> >>
> >>Thanks in advance
> >>
> >>Joao Baptista Filho
> >>
> >>--------------------------------------------------------------------
> >>This message was distributed by s-news at lists.biostat.wustl.edu.  To
> >>...(s-news.. clipped)...
> 
> >>
> >>
> > 
> >
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>



More information about the Bioconductor mailing list