[R] Cluster Analysis - Number of Clusters

Christian Hennig chrish at stats.ucl.ac.uk
Mon Feb 6 14:38:36 CET 2006


Hi,

as said before, some statistics to estimate the number of clusters are in 
the cluster.stats function of package fpc. These are distance-based, 
not "pseudo F or T^2". They are documented in the book 
of Gordon (1999) Classification (see ?cluster.stats for more references). 
It also includes the average silhouette width of Kaufman and Rousseeuw 
(1990) (exact reference in ?plot.agnes), which is also part of the output 
of some functions in package cluster (pam, agnes,...?).

An alternative way to estimate the number of clusters is the use of the 
BIC together with a (normal) mixture model, see package mclust.

Best,
Christian


On Sun, 5 Feb 2006, John Janmaat wrote:

> Hello,
>
> I'm playing around with cluster analysis, and am looking for methods to
> select the number of clusters.  I am aware of methods based on a 'pseudo
> F' or a 'pseudo T^2'.  Are there packages in R that will generate these
> statistics, and/or other statistics to aid in cluster number selection?
>
> Thanks,
>
> John.
> -- 
> ===========================================================================
> Dr. John Janmaat                       Tel: 902-585-1461
> Department of Economics                Fax: 902-585-1070
> Acadia University                      Email: jjanmaat at acadiau.ca
> Wolfville, Nova Scotia, Canada.        Web: ace.acadiau.ca/~jjanmaat/
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche




More information about the R-help mailing list