[BioC] Re: [S] Error in clustering procedure
Prof Brian Ripley
ripley at stats.ox.ac.uk
Mon Sep 13 11:03:14 CEST 2004
On Mon, 13 Sep 2004, michael watson (IAH-C) wrote:
> I guess I'm coming to this late,
You are, yet have overlooked important points in later parts of the
thread.
> but I'm pretty sure all biologists use
> cluster analysis for is for finding out which genes are behaving
> similarly to one another in a large data set.
Really? Have you never seen a heatmap with clustering on the margins?
There clustering is being used for seriation.
> Then if, for example, all
> genes from a certain pathway are showing a similar expression pattern,
> we have a hypothesis which can be tested further.
>
> If cluster analysis has indeed been "over-sold", please suggest a better
> algorithm for summarising groups of genes that are behaving similarly
> across a group of experiments or time-points :-)
My point was about methods/algorithms for cluster analysis, as I have
already replied in this thread.
But MDS-like methods (note, not algorithms) are better for your stated
purpose.
>
> M
>
> -----Original Message-----
> From: Ramon Diaz-Uriarte [mailto:rdiaz at cnio.es]
> Sent: 08 September 2004 09:33
> To: bioconductor at stat.math.ethz.ch
> Cc: Prof Brian Ripley; cstrato; James W. MacDonald
> Subject: Re: [BioC] Re: [S] Error in clustering procedure
>
>
> On Tuesday 07 September 2004 21:17, cstrato wrote:
> > Dear all
> >
> > First of all, I want to apologize to Prof. Ripley, since I forgot to
> > ask him first for permission to publish his comment.
> >
> > Personally, I agree that this would be useless, as Prof. Ripley has
> > already told me some years ago. However, almost everybody still seems
> > to do it and publish the corresponding results. Companies such as
> > Spotfire are proud that you can do hierarchical clustering with more
> > than 20,000 genes. I have never seen a publication where it was done
> > differently.
>
>
> Part of this could be the result of imitative behavior, beliefs that "unless I
> put a neat heatmap I won't get it past reviewers", etc, but not evidence that
> it is the best way to go. If several companies are making an issue out of it
> in their brochures, maybe it is because customers ask for clustering. As for
> "publish the corresponding results" I am not sure what the "results" are,
> since after clustering 7000 genes you can almost always make up a story after
> the fact; but I would not call that a result.
>
> I think clustering (and biclustering) do have a place, but I guess they should
> be used as a tool to answer some question (e.g., I think I understand what
> question a t-test is helping to answer; I am not sure about most clustering
> procedures), or as a guidance for something, not as some kind of magic tool
> to "let the data speak for themselves" ( = a) get the microarray data; b) run
> a clustering procedure; c) come up with a question that your cluster
> "answered".)
>
> R.
>
>
> >
> > I think that the bioconductor list would be the best forum to discuss
> > this issue, and provide solutions (besides the obvious suggestion to
> > filter non-varying genes).
> >
> > Best regards
> > Christian
> >
> > James W. MacDonald wrote:
> > > cstrato wrote:
> > >> Sorry, but I cannot resist:
> > >>
> > >> Any comments of the microarry community on the usefulness of
> > >> hierarchical clustering of 7000 genes?
> > >
> > > I think this would be almost completely useless. First off,
> > > clustering is not an inferential technique, so its use has been
> > > completely oversold IMO to the biological community. Secondly,
> > > clustering is usually done to produce a 'heat map' to put in a paper
> > > or flash on the screen during a presentation. How on earth would
> > > this be of any use? You couldn't even read any of the gene names!
> > >
> > > Of course you could use the heatmap to impress friends and
> > > colleagues with the fact that you rate a computer powerful enough to
> > > *do* a heatmap with a 7000 x 5 matrix ;-D
> > >
> > > Jim
> > >
> > >> Best regards
> > >> Christian
> > >> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> > >> C.h.r.i.s.t.i.a.n. .S.t.r.a.t.o.w.a
> > >> V.i.e.n.n.a. .A.u.s.t.r.i.a
> > >> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the Bioconductor
mailing list