[R] cluster size

Fri Dec 11 16:47:13 CET 2009

Dear Ms Karunambigai,

the kmeans algorithm depends on random initialisation.
There are two basic strategies that can be applied in order to make your 
results reproducible:
1) Fix the random number generator by means of set.seed (see ?set.seed) 
before you run kmeans. The problem with this is that your solution can 
only be reproduced using the same random seed; it technically still is 
random.
2) Specify fixed initial centers, using the centers argument in kmeans.
(Sensible initial centers may be obtained by running hclust using Ward's 
method, obtain the desired number of clusters using cutree and compute the 
centers of the resulting clusters; sorry that I 
don't have the time right now to explain how to do that precisely; the 
help pages and hopefully some understanding of what is going on may help 
you further.)

An alternative strategy that will not absolutely guarantee reproducibility 
but make your results more stable is to use kmeansruns in library fpc, which
is a wrapper that runs kmeans several times and gives you the optimal 
solution. That should reproduce its outcome with higher probability 
(though not precisely 1).
I don't know whether the default value runs=100 is sufficient to give a 
stable solution for your data, but increasing the runs parameter may help.

Cheers,
Christian

On Fri, 11 Dec 2009, karuna m wrote:

> hi r-help,
> i am doing kmeans clustering in stats. i tried for five clusters clustering using:
> kcl1 <- kmeans(as1[,c("contlife","somlife","agglife","sexlife",
>                         "rellife","hordlife","doutlife","symtlife","washlife",
>                        "chcklife","rptlife","countlife","coltlife","ordlife")], 5, iter.max = 10, nstart = 1,
>          algorithm = "Hartigan-Wong")
>       table(kcl1$cluster)
> every time i am getting five clusters of different sizes like first time with cluster sizes
> table(kcl1$cluster)
>   1   2   3   4   5
> 140  72 105  98 112
> second time with cluster sizes
> table(kcl1$cluster)
>   1   2   3   4   5
>  91 149 106  76 105 and so on.
> I wish to know that whether there is any function to get same sizes of clusters everytime when we do kmeans clustering.
> Thanks in advance.
> regards,
> Ms.Karunambigai M
> PhD Scholar
> Dept. of Biostatistics
> NIMHANS
> Bangalore
> India
>
>
>      The INTERNET now has a personality. YOURS! See your Yahoo! Homepage.
> 	[[alternative HTML version deleted]]
>
>

*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche