[R] Any functions to manipulate (merge, cut, remove) hclust objects? (maybe through phylo?)

Martin Maechler maechler at stat.math.ethz.ch
Wed Dec 29 15:01:37 CET 2010


>>>>> Tal Galili <tal.galili at gmail.com>
>>>>>     on Wed, 29 Dec 2010 14:08:26 +0200 writes:

    > Hello Martin,
    > Thank you for the reference to the "cut" option in the dendrogram help page!
    > I guess I was too focused on looking for a solution to the hclust object
    > then to think that such a method existed for dendrograms.

    > The cut.dendrogram  doesn't solve my problem yet, since what I'm looking for
    > is the output of something like:
    > cutree(hc.object, k = 3)

    > which is a vector indicating to which cluster belongs each item.

indeed; and that's only indirectly the result of  a cut(*, h= .) 
call.

BTW: cutree() internally translates a 
     'h = *' specification into a  'k = *' one.....
  ...
  ...
  which is actually a bit peculiar, as a cut at a given height is well-defined, 
  but a cut into a given number of clusters may *NOT* be well
  defined in the case where two sub branches have the exact same
  height 'h'; such that going from  h  to  'h - eps'  leads to
  addition of *two* new clusters, i.e., a step  k --> k+2  
  such that cutree(*, k+1) is not really well defined.
  The cutree() internal algorithm will use the (somewhat)
  arbitrary order of the merges to define the grouping.

Given all the above, I now tend to think that yes, indeed,
it may be most fruitful to provide
a  as.hclust.dendrogram() method, rather than just implementing
a cut() - based cutree method for dendrograms.

    > And for some reason I can't seem to understand the structure of the
    > dendrogram object using "str".

Yes;  there's a str.dendrogram() method which very nicely
shows the structure of a dendrogram, 
however, if you really want to see the internal structure, you need
  str(unclass( . ))

    > But I'll read some more and write back if I can't solve it.

    > p.s: If I'll succeed in writing something useful, it will be
    > my pleasure and honor to contribute it back to the r-project :)

Cool.
Actually, now I think the merge() is the much easier part than
the cutree() / as.hclust.dendrogram() one.
But also that should not be very hard.

As I'm officially in vacation at the moment, I may have some fun
helping with these...

Martin





    > On Wed, Dec 29, 2010 at 1:49 PM, Martin Maechler <maechler at stat.math.ethz.ch
    >> wrote:

    >> >>>>> Tal Galili <tal.galili at gmail.com>
    >> >>>>>     on Wed, 29 Dec 2010 13:32:26 +0200 writes:
    >> 
    >> > Hello Martin,
    >> > Thank you for replying.
    >> 
    >> > I have two needs:
    >> 
    >> > 1) To merge two dendrograms into one.
    >> 
    >> > 2) To then run cutree on it (which works on hclust, but
    >> >    not on dendrogram).
    >> 
    >> Well, but cut() does and is prominently mentioned on the
    >> dendrogram help page (and its examples)
    >> 
    >> > I guess that if I knew how to perform both steps I would be able to do
    >> what
    >> > I'm trying to do on my data.
    >> > If nothing like this currently exists, I guess I'll simply implement a
    >> > method of cutree for a dendrogram, and see how to merge two
    >> dendrograms
    >> > together.
    >> 
    >> so you only need to program the merge / join part.
    >> 
    >> I did not take the time to understand what exactly you mean with
    >> that, but as there is no function to do that with "hclust" either,
    >> I'm convinced you should rather write one for "dendrogram"
    >> indeed; as merge() is already "S3 generic", I'd call it
    >> merge.dendrogram()
    >> 
    >> If you end up finding it useful and are willing to write a help
    >> page (including examples!) for it, you may consider donating it
    >> back to the R-project ... ;-)
    >> 
    >> Regards, Martin
    >>



More information about the R-help mailing list