[R] kmeans cluster stability

Prof Brian D Ripley ripley at stats.ox.ac.uk
Tue Mar 13 22:44:28 CET 2001


On Tue, 13 Mar 2001, Marc Feldesman wrote:

> I'm doing kmeans partitioning on a small (n=26) dataset that has 5
> variables.  I noticed that if I repeatedly run the same command, the
> cluster centers change and the cluster membership changes.
>
> Using RW1022 under Windows NT & Windows 2000
>
>  >kmeans(pottery[,1:5], 4, 20)
>
> [...snip]
> $size
> [1] 7 3 9 7
> [...snip]
> $size
> [1]  7 10  4  5
> [...snip]
> $size
> [1]  6 10  5  5
>
> yields a different answer every time a run it.  Sometimes the answer is
> different only in the order of withinss (and the ordering of the numbers of
> cases assigned to each group).  Other times there are completely different
> centers, withinss and completely different cluster configurations.  This
> variability doesn't happen in either S-Plus 2000 or S-Plus 6.0 (Beta 2).
>
> I can see from the help that the R kmeans() function chooses a random set
> of rows as cluster centers if the initial centers aren't specified, while
> S-Plus uses hclust() and cutree() to determine the initial clusters.
>
> Is there any way to "make" kmeans results persist under repeated uses of
> the same command?

set.seed(123) or specify centers.  Don't assume that S-PLUS is getting a
better answer, BTW.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list