[R] kmeans (again)

Liaw, Andy andy_liaw at merck.com
Fri Jun 6 04:19:35 CEST 2003


Just because you get the same answer from different starting points doesn't
mean the algorithm isn't using the starting points you specified.

I tried:

> set.seed(1)
> x <- matrix(rnorm(12), 6, 2)
> kmeans(x, x[c(1,6),], 1)
$cluster
[1] 2 1 2 1 1 2

$centers
        [,1]      [,2]
1  0.7028106 0.6482392
2 -0.7608503 0.4843512

$withinss
[1] 2.86861843 0.04450923

$size
[1] 3 3

> kmeans(x, 2, 1)
$cluster
[1] 2 1 2 1 1 2

$centers
        [,1]      [,2]
1  0.7028106 0.6482392
2 -0.7608503 0.4843512

$withinss
[1] 2.86861843 0.04450923

$size
[1] 3 3

> kmeans(x, x[c(3,4),], 1)
$cluster
[1] 1 1 1 2 1 1

$centers
        [,1]       [,2]
1 -0.3538799  0.7406319
2  1.5952808 -0.3053884

$withinss
[1] 2.089050 0.000000

$size
[1] 5 1

which shows that the result *can* depend on the starting values.

Andy

> -----Original Message-----
> From: Luis Torgo [mailto:ltorgo at liacc.up.pt]
> Sent: Thursday, June 05, 2003 2:05 PM
> To: r-help at stat.math.ethz.ch
> Subject: [R] kmeans (again)
> 
> 
> Regarding a previous question concerning the kmeans function 
> I've tried the 
> same example and I also get a strange result (at least 
> according to what is 
> said in the help of the function kmeans). Apparently, the function is 
> disregarding the initial cluster centers one gives it. 
> According to the help 
> of the function:
> 
>  centers: Either the number of clusters or a set of initial cluster
>           centers...
> 
> Now a small dataset:
> > data<-matrix(c(-1,0,2,2.5,7,9,0,3,0,6,1,4),6,2)
> 
> If I use rows 3 and 4 as cluster centers and a single 
> iteration of kmeans I 
> get:
> > kmeans(data,data[c(3,4),],1)
> $cluster
> [1] 1 1 1 1 2 2
> 
> $centers
>    [,1] [,2]
> 1 0.875 2.25
> 2 8.000 2.50
> 
> $withinss
> [1] 32.9375  6.5000
> 
> $size
> [1] 4 2
> 
> If I now use rows 1 and 6 as cluster centers I get exactly 
> the same solution 
> after the first iteration:
> 
> > kmeans(data,data[c(1,6),],1)
> $cluster
> [1] 1 1 1 1 2 2
> 
> $centers
>    [,1] [,2]
> 1 0.875 2.25
> 2 8.000 2.50
> 
> $withinss
> [1] 32.9375  6.5000
> 
> $size
> [1] 4 2
> 
> So, apparently the function is disregarding the initial 
> cluster centers 
> information. This is even "confirmed" by the fact that if I 
> use the function 
> without cluster centers, simply given the number of clusters, 
> I get the same 
> solution:
> > kmeans(data,2,1)
> $cluster
> [1] 2 2 2 2 1 1
> 
> $centers
>    [,1] [,2]
> 1 8.000 2.50
> 2 0.875 2.25
> 
> $withinss
> [1]  6.5000 32.9375
> 
> $size
> [1] 2 4
> 
> 
> 
> -- 
> Luis Torgo
>     FEP/LIACC, University of Porto   Phone : (+351) 22 607 88 30
>     Machine Learning Group           Fax   : (+351) 22 600 36 54
>     R. Campo Alegre, 823             email : ltorgo at liacc.up.pt
>     4150 PORTO   -  PORTUGAL         WWW   : 
> http://www.liacc.up.pt/~ltorgo
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> 

------------------------------------------------------------------------------
Notice: This e-mail message, together with any attachments, cont... {{dropped}}




More information about the R-help mailing list