[R] non-uniqueness in cluster analysis

Bruno Giordano bruno at speech.kth.se
Wed Dec 3 16:53:01 CET 2003


What I did was, in presence of equal values distances, to randomize the
selection of them, and compute the distortion of the solution using
cophenetic correlation.
I computed 10000 "random" trees for each of three methods: average, single
and complete linkage.
Among the "randomly" selected solutions, for the three methods, average
linkage was able to give the highest cophenetic correlation, followed by
complete and then by single linkage. Among the "random" trees single
linkage, for obvious reasons, gave a constant cophenetic correlation.
My data set is rather small (25 objects). I'm seriously thinking of
calculating all the possible solutions (I guess about 30000), picking the
ones that give the highest cophenetic correlation, and analyzing the
consistency among those solutions, after establishing a "natural" number of
clusters.

    Bruno




More information about the R-help mailing list