[R] hclust with method = “ward”

PeterB pxbeqa at rit.edu
Thu Oct 7 03:11:05 CEST 2010


Thanks, Christian. This is really helpful.

I was not aware of that equality, but now I can see it. I think you mean the
inner sum over all distances in the distance matrix (for that cluster),
which means that each distance is counted twice (which is why we divide by
2).

Peter


Christian Hennig wrote:
> 
> The k-means/Ward criterion can be written down in terms of squared 
> Euclidean distances in a way that doesn't involve means. It is half the 
> sum (over all clusters) of the sum (over all observations in a 
> cluster) of all within-cluster squared dissimilarities, the inner sum 
> divided by the cluster size. This can also be computed for a general 
> dissimilarity matrix (this is for example done by cluster.stats in
> package fpc).
> 
> I'd guess that hclust with method="ward" uses this when run with a general 
> dissimilarity matrix. At least it would make sense, although I'm not sure 
> whether it really is what hclust does, because I didn't check the 
> underlying Fortran code.
> 
> Note that I may have missed postings in this thread, so sorry if this 
> doesn't add to what you already have worked out.
> 
> Christian
> 
> On Wed, 6 Oct 2010, PeterB wrote:
> 
>>
>> Apparently, the same issue exists in SAS, where there is an option to run
>> the
>> Ward algorithm based only on the distance matrix. Perhaps, a SAS user
>> could
>> confirm that or even check with SAS.
>>
>> Peter
>>
>> --
>> View this message in context:
>> http://r.789695.n4.nabble.com/hclust-with-method-ward-tp2952140p2965310.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> *** --- ***
> Christian Hennig
> University College London, Department of Statistical Science
> Gower St., London WC1E 6BT, phone +44 207 679 1698
> chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
-- 
View this message in context: http://r.789695.n4.nabble.com/hclust-with-method-ward-tp2952140p2966045.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list