[R] distance in the function kmeans

Christian Hennig fm3a004 at math.uni-hamburg.de
Fri May 28 12:06:40 CEST 2004


On Fri, 28 May 2004, Martin Maechler wrote:

> >>>>> "n\" == n\ bouget <n>
> >>>>>     on Fri, 28 May 2004 09:37:35 +0200 writes:
> 
>     n\> Hi, I want to know which distance is using in the
>     n\> function kmeans and if we can change this distance.
>     n\> Indeed, in the function pam, we can put a distance
>     n\> matrix in parameter (by the line
>     n\> "pam<-pam(dist(matrixdata),k=7)" ) but we can't do it in
>     n\> the function kmeans, we have to put the matrix of data
>     n\> directly ...  Thanks in advance, Nicolas BOUGET
> 
> It might be interesting to look at this from the pam()
> perspective:
> What exactly is pam() lacking that kmeans() does for you?
> 
> Christian, are you suggesting that pam() could do the job if
> 
> 1) there was a dist(., method="a la kmeans") 
> 2) pam() allowed to be started by a user-specified set of
> 	 medoids instead of the "Kaufman-Rousseeuw-optimal" ones
> ?

The k-means criterion is equivalent to:
Find a partition C=C_1 \cup...\cup C_k such that
\sum_{i=1}^k \sum_{x_j,x_l\in C_i} d(x_j,x_l)/|C_i|=min!

d is squared Euklidean distance (see the Bock book). You may wonder to 
what clustering this would lead with another distance.

The difference to pam is that pam minimizes sums of distances to centroid
objects, which have to be part of the dataset. k-means does not need
centroid objects, no "mean objects" are needed. Thus, pam with squared
Euklidean distances is a kind of approximation to k-means. (In practice,
both are approximations to a global optimum.)

There would also be a further version if other distances would be allowed,
the pam criterion would be optimized, but the cluster centers would be
allowed to lie elsewhere than on an object of the sample. 

Of course, pam and the original k-means are more or less easy to compute,
while the suggested alternatives may be computationally complex.

Best,
Christian


> 
> Regards,
> Martin Maechler
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 

***********************************************************************
Christian Hennig
Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg
hennig at math.uni-hamburg.de, http://www.math.uni-hamburg.de/home/hennig/
#######################################################################
ich empfehle www.boag-online.de




More information about the R-help mailing list