[R] hclust with method = “ward”

Christian Hennig chrish at stats.ucl.ac.uk
Wed Oct 6 18:48:12 CEST 2010


The k-means/Ward criterion can be written down in terms of squared 
Euclidean distances in a way that doesn't involve means. It is half the 
sum (over all clusters) of the sum (over all observations in a 
cluster) of all within-cluster squared dissimilarities, the inner sum 
divided by the cluster size. This can also be computed for a general 
dissimilarity matrix (this is for example done by cluster.stats in
package fpc).

I'd guess that hclust with method="ward" uses this when run with a general 
dissimilarity matrix. At least it would make sense, although I'm not sure 
whether it really is what hclust does, because I didn't check the 
underlying Fortran code.

Note that I may have missed postings in this thread, so sorry if this 
doesn't add to what you already have worked out.

Christian

On Wed, 6 Oct 2010, PeterB wrote:

>
> Apparently, the same issue exists in SAS, where there is an option to run the
> Ward algorithm based only on the distance matrix. Perhaps, a SAS user could
> confirm that or even check with SAS.
>
> Peter
>
> --
> View this message in context: http://r.789695.n4.nabble.com/hclust-with-method-ward-tp2952140p2965310.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche



More information about the R-help mailing list