[BioC] Common microarray clustering method

Jenny Bryan jenny at stat.ubc.ca
Mon Dec 13 18:27:39 CET 2004

Dear Ilya,

For better or for worse, the most commonly used method is
agglomerative hierarchical clustering.  For gene clustering, people
then tend to prune the resulting tree very high -- that is, one often
chooses K (= the number of clusters) to be much smaller than G (the
number of genes/probes/etc).  Given the output of a hierarchical
algorithm, however, it is wiser to do this with a tree produced by a
*divisive* algorithm.  This choice produces a more stable result
statistically, i.e. small perturbations in the input data tend to
create 'small' perturbations in the output.  An easy, well-established
way to do this is to use the 'diana' function from the R 'cluster'
package.  Agglomerative methods are also, of course, available via
'hclust' (in mva) and 'agnes' (in cluster).


Ilya Venger writes:
 > Hi,
 > I'm looking for a most commonly used microarray clustering method. The 
 > main problem is in deciding upon the appropriate amount of clusters to 
 > use in the clustering, both in the agglomerative and teh partitioning 
 > methods.
 > I know there are certain procedures such as MSS or v-fold cross 
 > validation, which I might run. This would allow me to compare 
 > clusterings resulting from incrementing the K (amount of clusters to use).
 > The main point is not only to find the best clustering algorithm, but a 
 > not less importantly, commonly used.
 > Thanks,
 > Ilya.

More information about the Bioconductor mailing list