[R] bug (?!) in "pam()" clustering from fpc package ?

Martin Maechler maechler at stat.math.ethz.ch
Sat Dec 20 13:59:43 CET 2008


>>>>> "TG" == Tal Galili <tal.galili at gmail.com>
>>>>>     on Sat, 20 Dec 2008 12:20:49 +0200 writes:

    TG> Thanks for the clarification Christian.  For future
    TG> people who will search, I did eventually found a way to
    TG> do Kmeans with manhattan distances, by using: the
    TG> "cclust" command (from the cclust package)

    TG> (where the parameter to change is "dist": dist- If
    TG> "euclidean", then mean square error, if "manhattan ",
    TG> the mean absolute error is used )

hmm,  "cclust" is indeed very flexible in its choice of
clustering options.

However, as I am the maintainer of the recommended package
"cluster", and as I have *added* feature to run pam() with given
"starting medoids" a (not so long) while ago (together with the
'trace' argument allowing you to see what the algorithms does),
exactly for the purpose of allowing more experimentation,
I'd really want to make a point here to say explicitly that
pam() *does* allow to work with specified initial medoids. 
I'm pretty sure (but haven't checked the formulae) that this
would still not be equivalent to "kmeans with manhattan
distance".
In particular, pam() should be more robust than any kmeans
incantation, since pam() does not use any (non-robust) mean.

Martin Maechler,
ETH Zurich




    TG> On Wed, Dec 17, 2008 at 1:25 PM, Christian Hennig
    TG> <chrish at stats.ucl.ac.uk>wrote:

    >> Dear Tal,
    >> 
    >> pam is not in the fpc package but in the cluster
    >> package. Look at ?pam and ?pam.object to find out what it
    >> does.  As far as I see, the medoids in the output object
    >> are the final cluster medoids, not the initial ones,
    >> which presumably explains the observed behaviour.
    >> 
    >> Best regards, Christian
    >> 
    >> 
    >> On Wed, 17 Dec 2008, Tal Galili wrote:
    >> 
    >> Hello all.
    >>> I wish to run k-means with "manhattan" distance.  Since
    >>> this is not supported by the function "kmeans", I turned
    >>> to the "pam" function in the "fpc" package.  Yet, when I
    >>> tried to have the algorithm run with different starting
    >>> points, I found that pam ignores and keep on starting
    >>> the algorithm from the same starting-points (medoids).
    >>> 
    >>> For my questions: 1) is there a bug in the code or in
    >>> the way I am using it ?  2) is there a way to either fix
    >>> the code or to another function in some package that can
    >>> run kmeans with manhattan distance (manhattan distances
    >>> are the sum of absolute differences) ?
    >>> 
    >>> here is a sample code: require(fpc) x <-
    >>> rbind(cbind(rnorm(10,0,0.5), rnorm(10,0,0.5)),
    >>> cbind(rnorm(15,5,0.5), rnorm(15,5,0.5))) pam(x, 2,
    >>> medoids = c(1,16))
    >>> 
    >>> 
    >>> output: Medoids: ID [1,] 3 -0.1406026 0.1131493 [2,] 17
    >>> 4.9564839 4.6480520 ...
    >>> 
    >>> So the initial medeoids where 3 and 17, not 1 and 16 as
    >>> I asked.
    >>> 
    >>> 
    >>> 
    >>> Thanks, Tal
    >>> 
    >>> 
    >>> 
    >>> --
    >>> ----------------------------------------------
    >>> Tal Galili Phone number: 972-50-3373767 FaceBook: Tal
    >>> Galili My Blogs: www.talgalili.com
    >>> www.biostatistics.co.il
    >>> 
    >>> 
    >> *** --- ***
> Christian Hennig University College London, Department of
    >> Statistical Science Gower St., London WC1E 6BT, phone +44
    >> 207 679 1698 chrish at stats.ucl.ac.uk,
    >> www.homepages.ucl.ac.uk/~ucakche
    >> 



-- 
----------------------------------------------
Tal Galili
    TG> Phone number: 972-50-3373767 FaceBook: Tal Galili My
    TG> Blogs: www.talgalili.com www.biostatistics.co.il

    TG> 	[[alternative HTML version deleted]]

    TG> ______________________________________________
    TG> R-help at r-project.org mailing list
    TG> https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
    TG> read the posting guide
    TG> http://www.R-project.org/posting-guide.html and provide
    TG> commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list