[R] K-Means clustering Algorithm

David L Carlson dcarlson at tamu.edu
Wed Aug 29 17:43:26 CEST 2012


It depends very much on what you consider "a small amount of error." Unless
you specify starting centroids, K-means does not necessarily produce a
unique partition for a particular data set unless you specify the starting
seeds. In other words, you can get different results using Matlab's kmeans
algorithm twice on the same data set (and the same for R's kmeans). One way
of reducing that possibility is to use multiple starting sets of randomly
chosen seeds (using nstart=10 in R kmeans or the 'replicates' option in
MATLAB). In this case, kmeans runs 10 times and picks the best solution. R
kmeans offers three different algorithms. By looking at the references in
MATLAB's description of kmeans and R's, you should be able to figure how to
match the two if that is really necessary. MATLAB has multiple options for
measuring distance whereas R kmeans does not. It also has several methods
for choosing starting seeds. In R you would have to use or create a function
to compute those starting seeds and then pass them to kmeans.

----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of olemissrebs1123
> Sent: Tuesday, August 28, 2012 3:16 PM
> To: r-help at r-project.org
> Subject: [R] K-Means clustering Algorithm
> 
> I was wondering if there was an R equivalent to the two phased approach
> that
> MATLAB uses in performing the Kmeans algorithm.  If not is there away
> that I
> can determine if the kmeans in R and the kmeans in MATLAB are
> essentially
> giving me the same clustering information within a small amount of
> error?
> 
> 
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/K-Means-
> clustering-Algorithm-tp4641626.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list