[R] agnes() in package cluster on R 2.14.1 and R 3.0.1
maechler at stat.math.ethz.ch
Wed Jun 12 14:59:48 CEST 2013
>>>>> Hugo Varet <varethugo at gmail.com>
>>>>> on Tue, 11 Jun 2013 15:15:36 +0200 writes:
> Dear Martin,
> Thank you for your answer. Here is the exact call to agnes():
> tableauTani<-dist.binary(mydata, method = 4, diag = FALSE, upper = FALSE)
> resAgnes.Tani<-agnes(tableauTani, diss = inherits(tableauTani,
> "dist"),method = "ward")
> classe.agnTani.3 <- cutree(resAgnes.Tani, 3)
> I'm going to send you the data in a separated e-mail.
Thank you, Hugo, and I got that alright.
I can see that many of the distances are *identical*, because
your data is completely binary.
>From experience, I know that this can lead (for some algorithms)
to "arbitrary" decisions in clustering, namely when two
*pairs* of observations / clusters have exactly the same
distance, it is somewhat random which of the pair is "merged" /
"fused" first, in a bottom up hierarchical algorithm such as agnes().
To reproduce your example (above) I need however to know
*where* you got the the dist.binary() function from.
It is not part of standard R nor of the cluster package.
> Le lundi 10 juin 2013, Martin Maechler <maechler at stat.math.ethz.ch> a
> écrit :
>>>>>>> Hugo Varet <varethugo at gmail.com>
>>>>>>> on Sun, 9 Jun 2013 11:43:32 +0200 writes:
>> > Dear R users,
>> > I discovered something strange using the function agnes() of the
>> > package on R 3.0.1 and on R 2.14.1. Indeed, the clusterings
> obtained are
>> > different whereas I ran exactly the same code.
>> hard to believe... but ..
>> > I quickly looked at the source code of the function and I
> discovered that
>> > there was an important change: agnes() in R 2.14.1 used a FORTRAN
>> > whereas agnes() in R 3.0.1 uses a C code.
>> well, it does so quite a bit longer, e.g., also in R 2.15.0
>> > Here is one of the contingency table between R 2.14.1 and R 3.0.1:
>> > classe.agnTani.2.14.1
>> > classe.agnTani.3.0.1 1 2 3
>> > 1 74 0 229
>> > 2 0 235 0
>> > 3 120 0 15
>> > So, I was wondering if it was normal that the C and FORTRAN codes
>> > different results?
>> It's not normal, and I'm pretty sure I have had many many
>> examples which gave identical results.
>> Can you provide a reproducible example, please?
>> If the example is too large [for dput() ], please send me the *.rda
>> file produced from
>> save(<your data>, file=<the file I neeed>)
>> *and* a the exact call to agnes() for your data.
>> Thank you in advance!
>> Martin Maechler,
>> the one you could have e-mailed directly
>> to using maintainer("cluster") ...
>> > Best regards,
>> > Hugo Varet
>> > [[alternative HTML version deleted]]
>> ^^^^^^^^^^^^^ try to avoid, please ^^^^^^^^^^^^^^^^^
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > and provide commented, minimal, self-contained, reproducible code.
>> yes indeed, please.
More information about the R-help