[R] Cluster analysis: dissimilar results between R and SPSS

Sarah Goslee sarah.goslee at gmail.com
Mon Apr 26 15:00:02 CEST 2010


I'm not sure why you'd expect Euclidean distance and squared Euclidean
distance to
give the same results.

Euclidean distance is the square root of the sums of squared
differences for each variable, and that's exactly what dist() returns.

http://en.wikipedia.org/wiki/Euclidean_distance

On a map, it's the length of the hypoteneuse, and you can measure it
with a ruler
and get the same number. Euclidean distance has a specific geometric meaning.

Squared Euclidean distance is not the same thing, and not the standard
definition
you seem to be expecting. If that's what you want, then square the
output of dist()
before you perform the clustering.

Sarah

On Mon, Apr 26, 2010 at 8:37 AM, Jeoffrey Gaspard
<jeoffrey.gaspard at gmail.com> wrote:
> Hello everyone!
>
> My data is composed of 277 individuals measured on 8 binary variables
> (1=yes, 2=no).
>
> I did two similar cluster analyses, one on SPSS 18.0 and one on R 2.9.2. The
> objective is to have the means for each variable per retained cluster.
>
> 1) the R analysis ran as followed:
>
>> call data
>> dist=dist(data,method="euclidean")
>> cluster=hclust(dist,method="ward")
>> cluster
>
> Call:
> hclust(d = dist, method = "ward")
>
> Cluster method   : ward
> Distance         : euclidean
> Number of objects: 277
>
>> plot(cluster)
>> rect.hclust(cluster, k=4, border="red")
>> x=rect.hclust(cluster, k=4, border="red")
>> sapply(x, function(i) colMeans(data[i,]))
>> round(sapply(x, function(i) colMeans(data[i,])),2)
>
> 2) The SPSS analysis ran as follows:
>
> Analysis --> Classify --> Hierarchical cluster analysis --> Cluster method=
> Ward's method and Distance measure= Interval:  Squared Euclidean distance.
> After that, I computed the means of each variable for each cluster.
>
> The problem is I have different results between the two analyses (different
> clusters and means).
>
> However, when I use the "Euclidean distance" (unsquared) in SPSS, I have the
> same results!
>
> I thought the R "euclidean" command meant the "usual square distance between
> the two vectors (2 norm)" as specified in the documentation, no the
> unsquared distance. Did it not?
>
> Thanks for the comment!
>
> Jeffrey
>
>



-- 
Sarah Goslee
http://www.functionaldiversity.org



More information about the R-help mailing list