[R] Specifying medoids in PAM?

Martin Maechler maechler at stat.math.ethz.ch
Thu Jun 9 01:08:50 CEST 2005


>>>>> "MM" == Martin Maechler <maechler at stat.math.ethz.ch>
>>>>>     on Wed, 8 Jun 2005 18:57:55 +0200 writes:

>>>>> "David" == David Finlayson <david.p.finlayson at gmail.com>
>>>>>     on Wed, 8 Jun 2005 09:24:54 -0700 writes:

    David> Sorry, I wasn't trying to submit a bug report just yet. 

    MM> the posting guide asks you to provide reproducible examples, in
    MM> any case, not just for bug reports ...
    MM> {and strictly speaking, you still haven't provided one, since
    MM> it's a bit painful to read in your table below -- because of the
    MM> extra row names ... but here I'm nit picking a bit }

    David> I wanted to see if I was using the command correctly. 

    MM> Yes, you were.


    >>> pam(stats.table, metric="euclidean", stand=TRUE, medoids=c(1,3,20,2,5), k=5)

    David> This command crashes RGUI.exe and windows sends an error report to
    David> Microsoft. It also crashes if I first subtract the NA rows from
    David> stats.table.

    MM> I can confirm to get segmentation faults using this example data
    MM> with k=5 ,  so effectively, it seems you've uncovered a bug in pam().
    MM> I will investigate and patch eventually.

I found and fixed the bug:  
Some part of the C code was assuming that the indices in
'medoids' were sorted (increasingly).

I.e., for the moment you can easily work around the problem by
using
   pam(stats.table, ...., medoids=c(1,2,3,5,20), k=5)
instead of
   pam(stats.table, ...., medoids=c(1,3,20,2,5), k=5)


The next version of the cluster package which allows to specify
the "fuzzyness exponent" in fanny()  will have this problem
fixed.

Martin Maechler,
ETH Zurich




More information about the R-help mailing list