[R] difference between trees in R?

Mark Robinson m.robinson at utoronto.ca
Tue Aug 21 16:09:27 CEST 2001


Hi.

I am wondering if anybody has studied and/or written code in R to
calculate the distance between 2 "trees".  For example, if one does a
hierarchical agglomerative clustering and say, a hierachical divisive
clustering (represented as trees) and wishes to compute a metric on
them.  I am thinking of something like the symmetric difference as
mentioned in Margush and McMorris (1982).

My application is actually a bit different than that above so I'll
describe it.  I actually want to combine numerous k-means
classifications into 1.  Because subsequent runs of the the k-means
procedure are going to give different cluster memberships (because of
different starting points), I wanted to run it a bunch of times and
combine it into a consensus.  But to do that, I wanted to quantify how
different a consensus of , for example, 3 k-mean runs is from a
consensus of 4 k-mean runs (denoted here by d(3,4)).

Presumably, the sequence d(3,4), d(4,5), ..., d(p,p+1) would keep
decreasing and at some point I would be satisfied that no further k-mean
runs to add to the consensus would be necessary.

I thought I could represent a k-means run as a binary tree or do a
hierarchical agglomerative clustering of a matrix of cluster memberships
(1s and 0s) from p k-mean runs but maybe this isn't the best approach.

So, is there a metric on two consensuses of k-mean runs?  Or another
approach that I can implement in R.

Many thanks for your suggestions.

M.


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list