[R] Empty clusters in k-means - possible solution
s.chamaille at yahoo.fr
Wed May 15 10:39:51 CEST 2013
k-means algorithms can at times fail because one of the cluster become
emmpty. In this case, the kmeans R function returns:
"empty cluster: try a better set of initial centers"
This has been discussed several times on several R-lists, and is NOT a
bug, but can be annoying when using k-means in complex simulation where
this error brings everything to a stop. One can use try() or tryCatch()
to avoid this, but this is just a programming trick.
I was wondering if anyone knows about a R implementation of k-means that
prevent this problem to happen. An very simple algorithm is proposed in
(Pakhira, A Modified k-means Algorithm to Avoid Empty
Clusters; International Journal of Recent Trends in Engineering, Vol 1,
No. 1, May 2009), in which the solution is simply to add the current
cluster centers to the datapoints when computing new cluster centers at
the next iteration. I could code that in pure R but that would be really
slow, and I'm too dumb to modify the current internal implementation. If
guys in R-dev think it is worth it, maybe this could be an option
available in a future version of kmeans?
Any suggestion would be appreciated.
More information about the R-help