[R] pam() seems to ignore cluster number

Martin Maechler maechler at stat.math.ethz.ch
Tue May 24 08:50:17 CEST 2011


>>>>> Dario Strbenac <D.Strbenac at garvan.org.au>
>>>>>     on Wed, 18 May 2011 12:00:11 +1000 writes:

    > I am using PAM with k = 10 clusters, but I only get one cluster
    > ID for all my observations. I couldn't find any discussion about
    > this in the help file, or mailing lists.  Is there a reasonable
    > explanation for this result ?

    > cIDs <- pam(all, 10, cluster.only = TRUE, do.swap = FALSE)
    >> table(cIDs)
    > cIDs
    > 0 
    > 16671

    > The matrix of observations can be found at :
    > http://129.94.136.7/file_dump/dario/all.obj

For the mailing list archives:

Dario's data contained so many NA's that some of the computed
dissimalirities "had to be" NA as well.
Had he used
    pam(all, 10)
    pam(all, 10, do.swap = FALSE)

he would have got the error message

   "No clustering performed, NAs in the computed dissimilarity matrix."

But because of  'cluster.only=TRUE' 
*and* because of a lapsus of the 'cluster' maintainer (me),
pam()  returned without the error message in this case.

The next release of R (or of 'cluster') will give the error
message also in the case of 'cluster.only=TRUE' .

Martin Maechler, ETH Zurich

    > I'm using R version 2.13.0 (2011-04-13) on Platform:
    > x86_64-unknown-linux-gnu (64-bit) and have cluster_1.13.3.

    > --------------------------------------
    > Dario Strbenac
    > Research Assistant
    > Cancer Epigenetics
    > Garvan Institute of Medical Research
    > Darlinghurst NSW 2010
    > Australia



More information about the R-help mailing list