[R] clustering with hclust

Christian Hennig ucakche at ucl.ac.uk
Fri Jul 25 13:19:19 CEST 2014


Dear Marianna,

the function agnes in library cluster can compute Ward's method from a raw 
data matrix (at least this is what the help page suggests).

Also, you may not be using the most recent version of hclust. The most 
recent version has a note in its help page that states:

"Two different algorithms are found in the literature for Ward clustering. 
The one used by option "ward.D" (equivalent to the only Ward option "ward" 
in R versions <= 3.0.3) does not implement Ward's (1963) clustering 
criterion, whereas option "ward.D2" implements that criterion (Murtagh and 
Legendre 2013). With the latter, the dissimilarities are squared before 
cluster updating. Note that agnes(*, method="ward") corresponds to 
hclust(*, "ward.D2")."

The Murtagh and Legendre paper has more details on this and is here:
http://arxiv.org/abs/1111.6285
F. Murtagh and P. Legendre, "Ward's hierarchical clustering method: 
clustering criterion and agglomerative algorithm"

It's not clear to me why one would want to use Ward's method for this kind 
of data, but that's your decision of course.

Best wishes,
Christian


On Fri, 25 Jul 2014, Marianna Bolognesi wrote:

> Hi everybody, I have a problem with a cluster analysis.
>
> I am trying to use hclust, method=ward.
>
> The Ward method works with SQUARED Euclidean distances.
>
> Hclust demands "a dissimilarity structure as produced by dist".
>
> Yet, dist does not seem to produce a table of squared euclidean distances,
> starting from cosines.
> In fact, computing manually the squared euclidean distances from cosines
> (d=2(1-cos)) produces a different outcome.
>
> As a consequence, using hclust with ward method on a table of cosines
> tranformed into distances with dist, produces a different dendrogram than
> other programs for hierarchical clustering with ward method (i.e.
> multidendrograms). Weird right??
>
> Computing manually the distances and then feeding them to hclust produces
> an error message. So, I am wondering, what the hell is this dist function
> doing?!
>
> thanks!
>
> marianna
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
c.hennig at ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche



More information about the R-help mailing list