[BioC] Re: [S] Error in clustering procedure

Ramon Diaz-Uriarte rdiaz at cnio.es
Wed Sep 8 10:33:27 CEST 2004


On Tuesday 07 September 2004 21:17, cstrato wrote:
> Dear all
>
> First of all, I want to apologize to Prof. Ripley, since I forgot
> to ask him first for permission to publish his comment.
>
> Personally, I agree that this would be useless, as Prof. Ripley
> has already told me some years ago. However, almost everybody
> still seems to do it and publish the corresponding results.
> Companies such as Spotfire are proud that you can do hierarchical
> clustering with more than 20,000 genes.
> I have never seen a publication where it was done differently.


Part of this could be the result of imitative behavior, beliefs that "unless I 
put a neat heatmap I won't get it past reviewers", etc, but not evidence that 
it is the best way to go. If several companies are making an issue out of it 
in their brochures, maybe it is because customers ask for clustering.  As for 
"publish the corresponding results" I am not sure what the "results" are, 
since after clustering 7000 genes you can almost always make up a story after 
the fact; but I would not call that a result. 

I think clustering (and biclustering) do have a place, but I guess they should 
be used as a tool to answer some question (e.g., I think I understand what 
question a t-test is helping to answer; I am not sure about most clustering 
procedures), or as a guidance for something, not as some kind of magic tool 
to "let the data speak for themselves" ( = a) get the microarray data; b) run 
a clustering procedure; c) come up with a question that your cluster 
"answered".)

R.


>
> I think that the bioconductor list would be the best forum to
> discuss this issue, and provide solutions (besides the obvious
> suggestion to filter non-varying genes).
>
> Best regards
> Christian
>
> James W. MacDonald wrote:
> > cstrato wrote:
> >> Sorry, but I cannot resist:
> >>
> >> Any comments of the microarry community on the usefulness of
> >> hierarchical clustering of 7000 genes?
> >
> > I think this would be almost completely useless. First off, clustering
> > is not an inferential technique, so its use has been completely oversold
> > IMO to the biological community. Secondly, clustering is usually done to
> > produce a 'heat map' to put in a paper or flash on the screen during a
> > presentation. How on earth would this be of any use? You couldn't even
> > read any of the gene names!
> >
> > Of course you could use the heatmap to impress friends and colleagues
> > with the fact that you rate a computer powerful enough to *do* a heatmap
> > with a 7000 x 5 matrix ;-D
> >
> > Jim
> >
> >> Best regards
> >> Christian
> >> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> >> C.h.r.i.s.t.i.a.n. .S.t.r.a.t.o.w.a
> >> V.i.e.n.n.a.         .A.u.s.t.r.i.a
> >> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor

-- 
Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)



More information about the Bioconductor mailing list