[R] Cluster analysis: hclust manipulation possible?

Jopi Harri jopi.harri at utu.fi
Mon Nov 16 15:31:19 CET 2009


I am doing cluster analysis [hclust(Dist, method="average")] on
data that potentially contains redundant objects. As expected,
the inclusion of redundant objects affects the clustering result,
i.e., the data a1, = a2, = a3, b, c, d, e1, = e2 is likely to
cluster differently from the same data without the redundancy,
i.e., a1, b, c, d, e1. This is apparent when the outcome is
visualized as a dendrogram.

Now, it seems that the clustering result for which the redundancy
has been eliminated is more robust for the present assignment
than that of the redundant data. Naturally, there is no problem
in the elimination: just exclude the redundant objects from Dist.

However, it would be very convenient to be able to include the
redundant objects in the *dendrogram* by attaching them as
0-level branches to the subtrees, i.e.:

1.0........-------........
0.5....___|__...._|_......
0.0.._|_..|..|..|.._|_....
....|.|.|.|..|..|.|...|...
...a1a2a3.b..c..d.e1.e2...

instead of

1.0........-------........
0.5....___|__...._|_......
0.0...|...|..|..|...|.....
......a1..b..c..d..e1.....

The question: Can this be accomplished in the *dendrogram plot*
by manipulating the resulting hclust data structure or by some
other means, and if yes, how?

Jopi Harri




More information about the R-help mailing list