[R] question about centroid-linkage (cluster analysis)

james.foadi at diamond.ac.uk james.foadi at diamond.ac.uk
Thu Dec 10 14:26:04 CET 2009


Dear R community,
I would be greatful if somebody could shed light on the following.

I have created a set of 6 points to check how centroid
agglomeration works in cluster analysis:

> Y <- data.frame(x=c(-1,1,1,-1,10,12),y=c(1,1,-1,-1,0,0))

It is quite intuitive to understand that the last clusters to be joined will be
{1,2,3,4} with {5,6}. Now, the centroid for the first cluster has coordinates (0,0),
while the centroid for the second cluster has coordinates (11,0). Therefore, the
distance between these two cluster should be 11. But:

> Y.dist <- dist(Y)
> Y.hc_c <- hclust(Y.dist,method="centroid")
> Y.hc_c$merge
     [,1] [,2]
[1,]   -1   -2
[2,]   -3    1
[3,]   -4    2
[4,]   -5   -6
[5,]    3    4
> Y.hc_c$height
[1] 2.000000 1.914214 1.517428 2.000000 9.692575


So, from this it would appear that the distance between the last two clusters is 9.692575!
How can it be?

J

Dr James Foadi PhD
Membrane Protein Laboratory (MPL)
Diamond Light Source Ltd
Diamond House
Harewell Science and Innovation Campus
Chilton, Didcot
Oxfordshire OX11 0DE

Email    :  james.foadi at diamond.ac.uk
Alt Email:  j.foadi at imperial.ac.uk


-- 
This e-mail and any attachments may contain confidential...{{dropped:8}}




More information about the R-help mailing list