[Rd] hclust: median, centroid (PR#4195)

kleiweg at let.rug.nl kleiweg at let.rug.nl
Tue Sep 16 23:23:17 MEST 2003


There seems to be a bug in hclust (package mva) for clustering
methods 'median' and 'centroid'.

I have written a clustering program in C and discovered that the
results for 'median' differ from those of hclust in R. I used a
third program, written by someone else in Pascal, and that
program agrees with the output of my program.

I found yet another clustering program that seems to be built on
the same fortran code as was used for hclust. The source of this
code mentions a bug in the original code that effects both
methods 'median' and 'centroid'. This program has a fix for this
bug, but I can find no similar fix in the code of R's hclust.

You can find the program with the fix at:
http://www2.biology.ualberta.ca/jbrzusto/ftp/trees/source.zip
The relevant file is: qclust.c
The bug is mentioned at line 670 of that code.
The fix for the bug starts at line 908.

Unfortunatly, I do not know Fortran programming, so I can not
offer a tested solution for hclust. I hope I have located the
problem accurately enough for others to deal with it further.

You can find a data set to test this bug at:
http://www.let.rug.nl/~kleiweg/R/data
If you source this file, and then run:

    sort(hclust(d, method="median")$height)

... you will see a list with the last value:

    0.08449670

The correct value should be:

    0.081786


--please do not edit the information below--

Version:
 platform = i686-pc-linux-gnu
 arch = i686
 os = linux-gnu
 system = i686, linux-gnu
 status =
 major = 1
 minor = 7.1
 year = 2003
 month = 06
 day = 16
 language = R

Search Path:
 .GlobalEnv, package:methods, package:ctest, package:mva, package:modreg, package:nls, package:ts, Autoloads, package:base



More information about the R-devel mailing list