[R] kmeans function

Tomassini, Letizia tomassini at vetmed.wsu.edu
Wed Mar 26 21:01:01 CET 2014


I would like to understand why the fastclus procedure in SAS is affected by the initial order of the data. So, with the same dataset, but sorted in a different way, I get different clusters rearrangements. I find this really disturbing. R seems to find the stable solution with the use of nstart=100 but I do not know how R does this and I do not know how to replicate this in SAS. All I know so far is that proc fastclus uses k-means as well.
Regarding R, for example, does the R software have a way of choosing always the same starting seeds? Does it reorganize the dataset according to an internal way of sorting the data before running kmeans?
I am interested in finding clusters with the best global minima and extract the seeds out of those. I need those seeds for following clustering number solutions (for example decide for lower number of clusters and use specific seeds). Overall I am better at using SAS, and I am trying to learn this piece of clustering design information from R to implement that in SAS.


Please let me know if you can help

Letizia



________________________________________
Da: r-help-bounces at r-project.org [r-help-bounces at r-project.org] per conto di Ranjan Maitra [maitra.mbox.ignored at inbox.com]
Inviato: mercoledì 26 marzo 2014 12.48
A: r-help at stat.math.ethz.ch
Oggetto: Re: [R] kmeans function

On Wed, 26 Mar 2014 18:35:34 +0000 "Tomassini, Letizia"
<tomassini at vetmed.wsu.edu> wrote:

>
> Hello
> I need to ask questions about the k-means clustering function. Mainly I would like to know why, with the use of nstart=enough number of times, kmeans always finds the same clustering arrangements; and this happens even when the input dataset is sorted in different ways or I take out few observations. I cannot seem to be able to recreate that when using SAS.

Do you understand what kmeans does? Why would you expect otherwise?
Besides, why does the function ahve to match SAS's output? (Do you
know how it goes about initializing the function in SAS?) In any
case, should it not be that it should provide the correct (best global
minima, if possible) answer?

Ranjan

____________________________________________________________
FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list