[R] non-uniqueness in cluster analysis

Wed Dec 3 15:49:18 CET 2003

Bruno  -

Many people add a tiny random number to each of the distances,
or deliberately randomize the input order.  This means that
any clustering is not reproducible, unless you go back to the
original randoms, but it forces you not to pay attention to
minor differences.

Ah, I think you're asking about bootstrap confidence intervals
for the set of descendants from each interior vertex.  This is
certainly routine procedure when inferring evolutionary trees,
but I'm not sure any of that code has been re-implemented in R
or Splus.

-  tom blackwell  -  u michigan medical school  -  ann arbor  -

On Wed, 3 Dec 2003, Bruno Giordano wrote:

> Hi,
> I'm clustering objects defined by categorical variables with a hierarchical
> algorithm - average linkage.
> My distance matrix (general dissimilarity coefficient) includes several
> distances with exactly the same values.
> As I see, a standard agglomerative procedure ignores this problems, simply
> selecting, above equal distances, the one that comes first.
> For this reason the analysis in output depends strongly on the orderings of
> the objects within the raw data matrix.
> Is there a standard procedure to deal with this?
> Thanks
>     Bruno