[R] finding a stable cluster for kmeans

Wayne.W.Jones at shell.com Wayne.W.Jones at shell.com
Tue Sep 25 13:02:01 CEST 2007


Hi there, 

If the final predicted clusters vary according to a random starting cluster then I suspect that your data is not clustering very well!! 
A few reasons for this may be: 

1) There are genuinely no clusters in the data!
2) You have chosen a poor distance measure.
3) You have picked an inappropriate number of clusters.

The basic goodness of fit of a cluster is that the variance within a cluster is small and the variance between clusters is large. 
Whenever I start to look for clusters I often use multidimensional scaling to look at the data in 2D! 

Lookup help(cmdscale)

If after this you wish to proceed, then I suggest you look up the library(cluster). 
The function silhouette is a nice tool to assess the appropriate number of clusters. 

Regards

Wayne


-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org]On Behalf Of "Julia Kröpfl"
Sent: 25 September 2007 10:01
To: R-help at r-project.org
Subject: [R] finding a stable cluster for kmeans


Hallo!

I applied kmeans to my data:

kcluster= kmeans((mydata, 4, iter.max=10)
table(code, kcluster$cluster)

If I run this code again, I get a different result as with the first trial (I understand that this is correct, since kmeans starts randomly with assigning the clusters and therefore the outcomes can be different)
But is there a way to stabilize the cluster (meaning finding the one cluster that appears the most often in 10 trials)?

Thank you for any ideas,
Julia 
--

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list