[R] Empty clusters in k-means - possible solution

Simon Chamaillé s.chamaille at yahoo.fr
Wed May 15 10:39:51 CEST 2013


Hello all,

k-means algorithms can at times fail because one of the cluster become 
emmpty. In this case, the kmeans R function returns:
"empty cluster: try a better set of initial centers"

This has been discussed several times on several R-lists, and is NOT a 
bug, but can be annoying when using k-means in complex simulation where 
this error brings everything to a stop. One can use try() or tryCatch() 
to avoid this, but this is just a programming trick.

I was wondering if anyone knows about a R implementation of k-means that 
prevent this problem to happen. An very simple algorithm is proposed in 
(Pakhira, A Modified k-means Algorithm to Avoid Empty
Clusters; International Journal of Recent Trends in Engineering, Vol 1, 
No. 1, May 2009), in which the solution is simply to add the current 
cluster centers to the datapoints when computing new cluster centers at 
the next iteration. I could code that in pure R but that would be really 
slow, and I'm too dumb to modify the current internal implementation. If 
guys in R-dev think it is worth it, maybe this could be an option 
available in a future version of kmeans?

Any suggestion would be appreciated.

simon



More information about the R-help mailing list