[R] kmeans (again)

Luis Torgo ltorgo at liacc.up.pt
Thu Jun 5 20:04:35 CEST 2003


Regarding a previous question concerning the kmeans function I've tried the 
same example and I also get a strange result (at least according to what is 
said in the help of the function kmeans). Apparently, the function is 
disregarding the initial cluster centers one gives it. According to the help 
of the function:

 centers: Either the number of clusters or a set of initial cluster
          centers...

Now a small dataset:
> data<-matrix(c(-1,0,2,2.5,7,9,0,3,0,6,1,4),6,2)

If I use rows 3 and 4 as cluster centers and a single iteration of kmeans I 
get:
> kmeans(data,data[c(3,4),],1)
$cluster
[1] 1 1 1 1 2 2

$centers
   [,1] [,2]
1 0.875 2.25
2 8.000 2.50

$withinss
[1] 32.9375  6.5000

$size
[1] 4 2

If I now use rows 1 and 6 as cluster centers I get exactly the same solution 
after the first iteration:

> kmeans(data,data[c(1,6),],1)
$cluster
[1] 1 1 1 1 2 2

$centers
   [,1] [,2]
1 0.875 2.25
2 8.000 2.50

$withinss
[1] 32.9375  6.5000

$size
[1] 4 2

So, apparently the function is disregarding the initial cluster centers 
information. This is even "confirmed" by the fact that if I use the function 
without cluster centers, simply given the number of clusters, I get the same 
solution:
> kmeans(data,2,1)
$cluster
[1] 2 2 2 2 1 1

$centers
   [,1] [,2]
1 8.000 2.50
2 0.875 2.25

$withinss
[1]  6.5000 32.9375

$size
[1] 2 4



-- 
Luis Torgo
    FEP/LIACC, University of Porto   Phone : (+351) 22 607 88 30
    Machine Learning Group           Fax   : (+351) 22 600 36 54
    R. Campo Alegre, 823             email : ltorgo at liacc.up.pt
    4150 PORTO   -  PORTUGAL         WWW   : http://www.liacc.up.pt/~ltorgo




More information about the R-help mailing list