[BioC] hclust and (Eisen+ de Hoon) cluster3 program

Antoine Lucas Antoine.Lucas at cgm.cnrs-gif.fr
Mon Dec 6 14:20:47 CET 2004


Dear all,

I saw (be maybe on a older version of Eisen software) a 
problem of precision, I sent him this remark (Apr 2002):

--- Old Message ---
I used simple data (see below) to understand the 
hierarchical clustering, and I did find the same
results with Maple (not very convenient !) but 
with a very different precision.

Example (Distance: correlation centered, average link):

        NODE1X  GENE15X GENE10X 0.9996337890625
        NODE2X  GENE20X GENE16X 0.99957275390625
        NODE3X  GENE14X GENE11X 0.99835205078125

Maple:             v
        Node1: .99959179339780276201
        Node2: .99956936766825333998
        Node3: .99833748845958267738

I thought that Cluster use Double precision, but
it should have something like 15 good digits.

Fortunately, data were very short, and with the
same order of magnitude, but a computer scientist
told me that floating point precision is far more
less if operands (in addition, substraction...) 
differ greatly in size. 

--------------
Data:

UNIQID  NAME    GWEIGHT GORDER  "V1"    "V2"    "V3"
EWEIGHT                         1       1       1
"A1"            1       1       2       16      18
"A2"            1       2       12      9       7
"A3"            1       3       9       10      4
"A4"            1       4       5       2       12
"A5"            1       5       12      14      7
"A6"            1       6       9       16      10
"A7"            1       7       8       10      10
"A8"            1       8       10      6       6
"A9"            1       9       14      1       28
"A10"           1       10      9       10      23
"A11"           1       11      9       16      27
"A12"           1       12      17      12      37
"A13"           1       13      15      5       23
"A14"           1       14      7       14      29
"A15"           1       15      11      8       29
"A16"           1       16      4       16      37
"A17"           1       17      32      25      34
"A18"           1       18      28      35      30
"A19"           1       19      30      28      23
"A20"           1       20      32      22      28
"A21"           1       21      25      22      26
"A22"           1       22      27      33      26
"A23"           1       23      28      33      31
"A24"           1       24      36      28      31


---
On Mon, 06 Dec 2004 13:09:46 +0100
Benjamin Haibe-Kains <bhaibeka at ulb.ac.be> wrote:

> Hi Michael,
> 
> I think that the differences are  too important to be due to different 
> implementation decisions. Actually my problem is that I have a group of 
> 1 object and the rest in the other group when I use the 'centroid' 
> hclust (I use cutree to have the main two groups)  and it's not the case 
> with other softwares. It looks like a bug in the Fortran routine but I 
> can not access to it.
> 
> Have you reported this "bug" before ? Can I write my 'centroid' method 
> easily ?
> 
> cheers,
> 
> benjamin
> 
> michael watson (IAH-C) wrote:
> 
> >Benjamin
> >
> >You will likely get different results from all clustering software, even
> >when using the same parameters.  This is because many arbitrary
> >decisions have to be made during a hierarchical cluster analysis and
> >different programmers will make those decisions in different ways.
> >
> >Mick
> >
> >-----Original Message-----
> >From: bioconductor-bounces at stat.math.ethz.ch
> >[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Benjamin
> >Haibe-Kains
> >Sent: 06 December 2004 11:05
> >To: Bioconductor Mailing List
> >Subject: [BioC] hclust and (Eisen+ de Hoon) cluster3 program
> >
> >
> >Hi all,
> >
> >I have a problem with the R function 'hclust'. I have noticed 
> >differences in clustering when I use the 'centroid' cluster method with 
> >'hclust' and the cluster3 program (see M. Eisen and M. de Hoon).
> >
> >Have you noticed some differences too ?
> >
> >I use
> >
> >hclust from library 'stats' (Built: R 2.0.1; i386-pc-linux-gnu; 
> >2004-11-15 15:56:06; unix)
> >cluster 3.0 using C Clustering Library version 1.25
> >
> >Thanks a lot
> >
> >  
> >
> 

-- 
Antoine Lucas
Centre de génétique Moléculaire, CNRS
91198 Gif sur Yvette Cedex
Tel: (33)1 69 82 38 89
Fax: (33)1 69 82 38 77



More information about the Bioconductor mailing list