[Rd] kmeans

Martin Maechler maechler at stat.math.ethz.ch
Mon Jul 5 18:54:36 CEST 2010


>>>>> Gabor Grothendieck <ggrothendieck at gmail.com>
>>>>>     on Fri, 2 Jul 2010 18:50:28 -0400 writes:

    > In kmeans() in stats one gets an error message with the default
    > clustering algorithm if centers = 1.  Its often useful to calculate
    > the sum of squares for 1 cluster, 2 clusters, etc. and this error
    > complicates things since one has to treat 1 cluster as a special case.
    > A second reason is that easily getting the 1 cluster sum of squares
    > makes it easy to calculate the between cluster sum of squares when
    > there is more than 1 cluster.

    > I suggest adding the line marked ### to the source code of kmeans (the
    > other lines shown are just ther to illustrate context).  Adding this
    > line forces kmeans to use the code for algorithm 3 if centers is 1.
    > This is useful since unlike the code for the default algorithm, the
    > code for algorithm 3 succeeds for centers = 1.

    > if(length(centers) == 1) {
    > if (centers == 1) nmeth <- 3 ###
    > k <- centers

I agree that this is a reasonable improvement,
and have applied this (+ docu + example) to the R-devel sources.

Thank you, Gabor.


    > Also note that KMeans in Rcmdr produces a betweenss and a tot.withinss
    > and it would be nice if kmeans in stats did that too:

Well, patches (to the R-devel *sources*) are happily accepted 

Martin


    >> library(Rcmdr)
    >> str(KMeans(USArrests, 3))
    > List of 6
    > $ cluster     : Named int [1:50] 1 1 1 2 1 2 3 1 1 2 ...
    > ..- attr(*, "names")= chr [1:50] "Alabama" "Alaska" "Arizona" "Arkansas" ...
    > $ centers     : num [1:3, 1:4] 11.81 8.21 4.27 272.56 173.29 ...
    > ..- attr(*, "dimnames")=List of 2
    > .. ..$ : chr [1:3] "1" "2" "3"
    > .. ..$ : chr [1:4] "Murder" "Assault" "UrbanPop" "Rape"
    > $ withinss    : num [1:3] 19564 9137 19264
    > $ size        : int [1:3] 16 14 20
    > $ tot.withinss: num 47964  <=================
    > $ betweenss   : num 307844 <=================
    > - attr(*, "class")= chr "kmeans"

    > ______________________________________________
    > R-devel at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list