[Rd] k means

cgenolin at u-paris10.fr cgenolin at u-paris10.fr
Sat May 17 00:54:55 CEST 2008


Hi the list

I try the flexclust, but I do not manage to see what is wrong in my 
(very simple) code...
Will you have few minutes to check it?

Thanks for your help.

Christophe
--- 8< --------------------------------
data  <- rbind(c(1,2 ,NA,4 ),
               c(1,1 ,NA,1 ),
               c(2,3 ,4 ,5 ),
               c(2,2 ,2 ,2 ),
               c(3,NA,NA,6 ),
               c(3,NA,NA,3 ),
               c(2,4 ,4 ,NA),
               c(2,3 ,2 ,NA))

distTest <- rbind(c(0,0,0,0),
                  c(1,1,1,1))

distNA <- function(x,centers){
    z <- matrix(0,nrow=nrow(x),ncol=nrow(centers))
    for(k in 1:nrow(centers)){
        z[,k]<- apply(x,1,function(x){dist(rbind(x,centers[k,]))})
    }
    z
}

distNA(data,distTest)

km <- kccaFamily(dist=distNA,cent=colMeans)
kcca(x=data,k=2,family=km)
kcca(x=data,k=3,family=km)

--- 8< --------------------------------






>>>>>> On Mon, 12 May 2008 19:24:55 +0200,
>>>>>> cgenolin  (c) wrote:
>
>  > Hi the devel list,
>  > I am using K means with a non standard distance. As far as I see, the
>  > function kmeans is able to deal with 4 differents algorithm, but not
>  > with a user define distance.
>
>  > In addition, kmeans is not able to deal with missing value whereas
>  > there is several solution that k-means can use to deal with them ; one
>  > is using a distance that takes the missing value in account, like a
>  > distance with Gower adjustement (which is the regular distance dist()
>  > used in R).
>
>  > So is it possible to adapt kmeans to let the user gives an argument
>  > 'distance to use'?
>
> As Bill Venables already pointed out that makes not too much sense,
> especially as there are already R functions for doing that. Package
> flexclust implements a k-means-type clustering algorithm where the
> user can provide arbitrary distance measures, have a look at
>
>     http://www.stat.uni-muenchen.de/~leisch/papers/Leisch-2006.pdf
>
> The code you need to write for using a new distance measure is
> minimal, and there are two examples in the paper describing in detail
> what needs to be done.
>
> Hope this helps,
> Fritz Leisch
>
> --
> -----------------------------------------------------------------------
> Prof. Dr. Friedrich Leisch
>
> Institut für Statistik                          Tel: (+49 89) 2180 3165
> Ludwig-Maximilians-Universität                  Fax: (+49 89) 2180 5308
> Ludwigstraße 33
> D-80539 München                     http://www.statistik.lmu.de/~leisch
> -----------------------------------------------------------------------
>   Journal Computational Statistics --- http://www.springer.com/180
>          Münchner R Kurse --- http://www.statistik.lmu.de/R
> -----------------------------------------------------------------------
>
>



More information about the R-devel mailing list