[BioC] Re: [S] Error in clustering procedure

cstrato cstrato at aon.at
Wed Sep 8 20:53:26 CEST 2004


Thank you all very much for your replies.

Already three years ago an identical question about memory error with
clustering encouraged me to start a similar discussion, see:
https://www.stat.math.ethz.ch/pipermail/r-help/2001-November/015524.html
https://www.stat.math.ethz.ch/pipermail/r-help/2001-December/015557.html

For some reason I have the feeling that nothing has changed since then,
and personally I am still uncomfortable to do clustering. For me, many
of the questions that I brought up, are still not solved.

Best regards
Christian



Prof Brian Ripley wrote:
> Please note my comment was not about the usefulness of clustering or even
> of hierarchical clustering, but about the sub-optimality of 
> *agglomerative* clustering on large sets.
> 
> If you think you need clustering with thousands of objects there are in my
> experience always better ways to achieve the real objective than
> agglomerative clustering.  Typically people are looking for a few large
> clusters or outliers or many small clusters within already known larger
> groupings. In the case of a heatmap, clustering is being used to produce a
> 1D MDS (a seriation) for which better methods are known.
> 
> BDR
> 
> On Wed, 8 Sep 2004, Ramon Diaz-Uriarte wrote:
> 
> 
>>On Tuesday 07 September 2004 21:17, cstrato wrote:
>>
>>>Dear all
>>>
>>>First of all, I want to apologize to Prof. Ripley, since I forgot
>>>to ask him first for permission to publish his comment.
>>>
>>>Personally, I agree that this would be useless, as Prof. Ripley
>>>has already told me some years ago. However, almost everybody
>>>still seems to do it and publish the corresponding results.
>>>Companies such as Spotfire are proud that you can do hierarchical
>>>clustering with more than 20,000 genes.
>>>I have never seen a publication where it was done differently.
>>
>>
>>Part of this could be the result of imitative behavior, beliefs that "unless I 
>>put a neat heatmap I won't get it past reviewers", etc, but not evidence that 
>>it is the best way to go. If several companies are making an issue out of it 
>>in their brochures, maybe it is because customers ask for clustering.  As for 
>>"publish the corresponding results" I am not sure what the "results" are, 
>>since after clustering 7000 genes you can almost always make up a story after 
>>the fact; but I would not call that a result. 
>>
>>I think clustering (and biclustering) do have a place, but I guess they should 
>>be used as a tool to answer some question (e.g., I think I understand what 
>>question a t-test is helping to answer; I am not sure about most clustering 
>>procedures), or as a guidance for something, not as some kind of magic tool 
>>to "let the data speak for themselves" ( = a) get the microarray data; b) run 
>>a clustering procedure; c) come up with a question that your cluster 
>>"answered".)
>>
>>R.
>>
>>
>>
>>>I think that the bioconductor list would be the best forum to
>>>discuss this issue, and provide solutions (besides the obvious
>>>suggestion to filter non-varying genes).
>>>
>>>Best regards
>>>Christian
>>>
>>>James W. MacDonald wrote:
>>>
>>>>cstrato wrote:
>>>>
>>>>>Sorry, but I cannot resist:
>>>>>
>>>>>Any comments of the microarry community on the usefulness of
>>>>>hierarchical clustering of 7000 genes?
>>>>
>>>>I think this would be almost completely useless. First off, clustering
>>>>is not an inferential technique, so its use has been completely oversold
>>>>IMO to the biological community. Secondly, clustering is usually done to
>>>>produce a 'heat map' to put in a paper or flash on the screen during a
>>>>presentation. How on earth would this be of any use? You couldn't even
>>>>read any of the gene names!
>>>>
>>>>Of course you could use the heatmap to impress friends and colleagues
>>>>with the fact that you rate a computer powerful enough to *do* a heatmap
>>>>with a 7000 x 5 matrix ;-D
>>>>
>>>>Jim
>>>>
>>>>
>>>>>Best regards
>>>>>Christian
>>>>>-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
>>>>>C.h.r.i.s.t.i.a.n. .S.t.r.a.t.o.w.a
>>>>>V.i.e.n.n.a.         .A.u.s.t.r.i.a
>>>>>-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
>>>
>>>_______________________________________________
>>>Bioconductor mailing list
>>>Bioconductor at stat.math.ethz.ch
>>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>
>>
>



More information about the Bioconductor mailing list