[BioC] Re: [S] Error in clustering procedure

Mon Sep 13 11:08:34 CEST 2004

Great, that's what I was looking for!
Personally, I use cluster analysis sparingly and as a very "exploratory"
tool.  
I think, though I may be wrong, that most biologists realise its
limitations.
I also think that it is not "completely useless", and perhaps if people
do think a method is useless, they should suggest an alternative, which
you have.  Thank you!

M

-----Original Message-----
From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk] 
Sent: 13 September 2004 10:03
To: michael watson (IAH-C)
Cc: Ramon Diaz-Uriarte; bioconductor at stat.math.ethz.ch; cstrato; James
W. MacDonald
Subject: RE: [BioC] Re: [S] Error in clustering procedure

On Mon, 13 Sep 2004, michael watson (IAH-C) wrote:

> I guess I'm coming to this late,

You are, yet have overlooked important points in later parts of the 
thread.

> but I'm pretty sure all biologists use
> cluster analysis for is for finding out which genes are behaving 
> similarly to one another in a large data set.

Really?  Have you never seen a heatmap with clustering on the margins?
There clustering is being used for seriation.

> Then if, for example, all
> genes from a certain pathway are showing a similar expression pattern,

> we have a hypothesis which can be tested further.
> 
> If cluster analysis has indeed been "over-sold", please suggest a 
> better algorithm for summarising groups of genes that are behaving 
> similarly across a group of experiments or time-points :-)

My point was about methods/algorithms for cluster analysis, as I have
already replied in this thread.

But MDS-like methods (note, not algorithms) are better for your stated 
purpose.

> 
> M
> 
> -----Original Message-----
> From: Ramon Diaz-Uriarte [mailto:rdiaz at cnio.es]
> Sent: 08 September 2004 09:33
> To: bioconductor at stat.math.ethz.ch
> Cc: Prof Brian Ripley; cstrato; James W. MacDonald
> Subject: Re: [BioC] Re: [S] Error in clustering procedure
> 
> 
> On Tuesday 07 September 2004 21:17, cstrato wrote:
> > Dear all
> >
> > First of all, I want to apologize to Prof. Ripley, since I forgot to
> > ask him first for permission to publish his comment.
> >
> > Personally, I agree that this would be useless, as Prof. Ripley has
> > already told me some years ago. However, almost everybody still
seems 
> > to do it and publish the corresponding results. Companies such as 
> > Spotfire are proud that you can do hierarchical clustering with more

> > than 20,000 genes. I have never seen a publication where it was done

> > differently.
> 
> 
> Part of this could be the result of imitative behavior, beliefs that 
> "unless I
> put a neat heatmap I won't get it past reviewers", etc, but not
evidence that 
> it is the best way to go. If several companies are making an issue out
of it 
> in their brochures, maybe it is because customers ask for clustering.
As for 
> "publish the corresponding results" I am not sure what the "results"
are, 
> since after clustering 7000 genes you can almost always make up a
story after 
> the fact; but I would not call that a result. 
> 
> I think clustering (and biclustering) do have a place, but I guess 
> they should
> be used as a tool to answer some question (e.g., I think I understand
what 
> question a t-test is helping to answer; I am not sure about most
clustering 
> procedures), or as a guidance for something, not as some kind of magic
tool 
> to "let the data speak for themselves" ( = a) get the microarray data;
b) run 
> a clustering procedure; c) come up with a question that your cluster 
> "answered".)
> 
> R.
> 
> 
> >
> > I think that the bioconductor list would be the best forum to 
> > discuss
> > this issue, and provide solutions (besides the obvious suggestion to

> > filter non-varying genes).
> >
> > Best regards
> > Christian
> >
> > James W. MacDonald wrote:
> > > cstrato wrote:
> > >> Sorry, but I cannot resist:
> > >>
> > >> Any comments of the microarry community on the usefulness of
> > >> hierarchical clustering of 7000 genes?
> > >
> > > I think this would be almost completely useless. First off,
> > > clustering is not an inferential technique, so its use has been 
> > > completely oversold IMO to the biological community. Secondly, 
> > > clustering is usually done to produce a 'heat map' to put in a
paper 
> > > or flash on the screen during a presentation. How on earth would 
> > > this be of any use? You couldn't even read any of the gene names!
> > >
> > > Of course you could use the heatmap to impress friends and
> > > colleagues with the fact that you rate a computer powerful enough
to 
> > > *do* a heatmap with a 7000 x 5 matrix ;-D
> > >
> > > Jim
> > >
> > >> Best regards
> > >> Christian
> > >> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> > >> C.h.r.i.s.t.i.a.n. .S.t.r.a.t.o.w.a
> > >> V.i.e.n.n.a.         .A.u.s.t.r.i.a
> > >> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> 
> 

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595