[BioC] linkage distances

David Ruau David.Ruau at rwth-aachen.de
Wed Jun 13 21:36:17 CEST 2007


For a good source of information on linkage methods you should have a  
look at this book:
"Finding Groups in Data. An introduction to cluster analysis"
from L. Kaufman and P. J. Rousseeuw
at Wiley

This is a really easy book to read.
For understanding linkage methods look at chapter 5, page 199. An  
explanation is given page 225 also.
Have also a look for a quick overview on page 47. All the method  
describe in this book are implemented into the package 'cluster'
In the end, for the linkage method, I always use the same: UPGMA also  
call average method.
In the book they mention that you have to choose the linkage method  
according to the type of cluster shape you search.
I never found the answer to the cluster shape when your matrix has  
more than 3 dimension... :)

What I play with is the distance/similarity measure.
When speaking about distance you should make a difference between  
Metric (euclidean...), parametric and non-parametric.
Parametric correlation measures can, due to their sensitivity to  
outliers, give non-homogeneous cluster solutions. In this case non- 
parametric correlations, such as Spearman Rank correlation or  
Kendall’s t rank correlation, are preferred.
The distance use by Eisen in his paper of 1998 is the cosine distance  
correlation also call not centered Pearson. And it give good results.

David
---
David Ruau
Institute for Biomedical Engineering
-Cell Biology-
Universitatsklinikum Aachen, RWTH
Pauwelsstrasse 30
52074 Aachen
GERMANY
GPG: 4210CA11

On Jun 13, 2007, at 3:13 PM, Daniel Brewer wrote:

> Hi,
>
> I have been producing some dendograms using hclust with a variety of
> linkage distance measures.  Does anyone know or is there a good  
> resource
> that explains why one would use one linkage distance rather than  
> another?
>
> I don't really like dealing with dendograms, but we want to produce
> groupings based on these to do differential analysis on, and I would
> like to be able to justify it.
>
> Thanks
>
> Dan
>
> -- 
> **************************************************************
> Daniel Brewer, Ph.D.
> Institute of Cancer Research
> Email: daniel.brewer at icr.ac.uk
> **************************************************************
>
> The Institute of Cancer Research: Royal Cancer Hospital, a  
> charitable Company Limited by Guarantee, Registered in England  
> under Company No. 534147 with its Registered Office at 123 Old  
> Brompton Road, London SW7 3RP.
>
> This e-mail message is confidential and for use by the add...{{dropped}}



More information about the Bioconductor mailing list