[R] agnes() in package cluster on R 2.14.1 and R 3.0.1

Martin Maechler maechler at stat.math.ethz.ch
Wed Jun 12 14:59:48 CEST 2013


>>>>> Hugo Varet <varethugo at gmail.com>
>>>>>     on Tue, 11 Jun 2013 15:15:36 +0200 writes:

    > Dear Martin,
    > Thank you for your answer. Here is the exact call to agnes():
    > setwd("E:/Hugo")
    > library(cluster)
    > load("mydata.rda")
    > tableauTani<-dist.binary(mydata, method = 4, diag = FALSE, upper = FALSE)
    > resAgnes.Tani<-agnes(tableauTani, diss = inherits(tableauTani,
    > "dist"),method = "ward")
    > classe.agnTani.3 <- cutree(resAgnes.Tani, 3)

    > I'm going to send you the data in a separated e-mail.

Thank you, Hugo, and I got that alright.

I can see that many of the distances are *identical*, because
your data is completely binary.
>From experience, I know that this can lead (for some algorithms)
to "arbitrary" decisions in clustering, namely when two
*pairs* of observations / clusters have exactly the same
distance, it is somewhat random which of the pair is "merged" /
"fused" first, in a bottom up hierarchical algorithm such as agnes().

To reproduce your example (above) I need however to know 
*where* you got the the  dist.binary()  function from.
It is not part of standard R nor of the cluster package.

Regards,
Martin


    > Regards,

    > Hugo


    > Le lundi 10 juin 2013, Martin Maechler <maechler at stat.math.ethz.ch> a
    > écrit :
    >>>>>>> Hugo Varet <varethugo at gmail.com>
    >>>>>>> on Sun, 9 Jun 2013 11:43:32 +0200 writes:
    >> 
    >> > Dear R users,
    >> > I discovered something strange using the function agnes() of the
    > cluster
    >> > package on R 3.0.1 and on R 2.14.1. Indeed, the clusterings
    > obtained are
    >> > different whereas I ran exactly the same code.
    >> 
    >> hard to believe... but ..
    >> 
    >> > I quickly looked at the source code of the function and I
    > discovered that
    >> > there was an important change: agnes() in R 2.14.1 used a FORTRAN
    > code
    >> > whereas agnes() in R 3.0.1 uses a C code.
    >> 
    >> well, it does so quite a bit longer, e.g., also in R 2.15.0
    >> 
    >> > Here is one of the contingency table between R 2.14.1 and R 3.0.1:
    >> > classe.agnTani.2.14.1
    >> > classe.agnTani.3.0.1      1        2       3
    >> > 1    74       0    229
    >> > 2     0    235        0
    >> > 3  120       0      15
    >> 
    >> > So, I was wondering if it was normal that the C and FORTRAN codes
    > give
    >> > different results?
    >> 
    >> It's not normal, and I'm pretty sure I have had many many
    >> examples which gave identical results.
    >> 
    >> Can you provide a reproducible example, please?
    >> If the example is too large [for dput() ], please send me the *.rda
    >> file produced from
    >> save(<your data>, file=<the file I neeed>)
    >> *and* a the exact call to agnes() for your data.
    >> 
    >> Thank you in advance!
    >> 
    >> Martin Maechler,
    >> the one you could have e-mailed directly
    >> to using   maintainer("cluster") ...
    >> 
    >> 
    >> > Best regards,
    >> > Hugo Varet
    >> 
    >> > [[alternative HTML version deleted]]
    >> ^^^^^^^^^^^^^ try to avoid, please ^^^^^^^^^^^^^^^^^
    >> 
    >> > ______________________________________________
    >> > R-help at r-project.org mailing list
    >> > https://stat.ethz.ch/mailman/listinfo/r-help
    >> > PLEASE do read the posting guide
    > http://www.R-project.org/posting-guide.html
    >> > and provide commented, minimal, self-contained, reproducible code.
    >> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    >> yes indeed, please.
    >>



More information about the R-help mailing list