[BioC] Hier.Clustering: group size effect

Heike Pospisil pospisil at zbh.uni-hamburg.de
Fri Feb 3 11:45:20 CET 2006


Hello,

I have a question concerning hierarchical clustering and the effect of group sizes.

I would like to select genes that are differentially expressed between group A 
and group B. Afterwards, I wish to cluster the samples by these genes. In 
principle, it works fine, but I have a problem if the group sizes are 
significantly unequal. One example is as e.g.:
group A: 53 samples
group B: 12 samples
The resulting clustering brings group B together, but it is not clearly 
separated from group A. Then again, if I take 12 samples from group A randomly 
(to get equal group sizes), the clustering is nearly perfect.

I use hclust(dist(t(exprs(sub)),method="euclidean"),method="complete") 
(ncol(sub) = groupA+groupB and nrow(sub) = number of sign.genes) and tried other 
distance measures, but without improvement.

Does anybody have a hint which clustering algorithm should be prefered for such 
unequal group sizes?

Thanks in advance and best wishes,
Heike
-- 
Dr. Heike Pospisil      | pospisil at zbh.uni-hamburg.de
University of Hamburg   | Center for Bioinformatics
Bundesstrasse 43        | 20146 Hamburg, Germany
phone:+49-40-42838-7303 | fax: +49-40-42838-7312



More information about the Bioconductor mailing list