[R] [cluster package question] What is the "sum of the dissimilarities" in the pam command ?

Martin Maechler maechler at stat.math.ethz.ch
Mon Mar 30 11:31:46 CEST 2009


>>>>> "TG" == Tal Galili <tal.galili at gmail.com>
>>>>>     on Sun, 29 Mar 2009 03:09:17 +0300 writes:

    TG> Hello Martin Maechler and All,
    TG> A simple question (I hope):
    TG> How can I compute the "sum of the dissimilarities" that appears in the pam
    TG> command (from the cluster package) ?


    TG> Is it the "manhattan" distance (such as the one implemented by "dist") ?


well, it first depends if  'x'  in  pam(x, k, dist, metric, ...)
is *itself* a dissimilarity object or not.
-->  help(daisy)  and  help(dist)

If it is *not*  --- which I assume from your question ---
then the answer depends on the 'metric' argument of pam().

As you did not mention that, I assume  you left 'metric' at its
default which is "euclidean", i.e.,
not "manhattan".



    TG> I am asking since I am running clustering on a dataset. I found 7 medoids
    TG> with the pam command, and from it I have the medoid to which each
    TG> observation belongs to. But when I check it, I find only (about) 90% of
    TG> observations has the minimum manhattan distance to the medoids that pam
    TG> predicted.

    TG> If this is the manhattan distance that is used, I will create some toy data
    TG> to see if I can reproduce this.

Yes, specifying some reproducible toy data and specific R code
is almost always useful and typically more productive when
asking such questions by e-mail.

Regards,
Martin Maechler, ETH Zurich

    TG> Thanks,
    TG> Tal

    TG> ----------------------------------------------


    TG> My contact information:
    TG> Tal Galili
    TG> Phone number: 972-50-3373767
    TG> FaceBook: Tal Galili
    TG> My Blogs:
    TG> http://www.r-statistics.com/
    TG> http://www.talgalili.com
    TG> http://www.biostatistics.co.il




More information about the R-help mailing list