[R] Finding overlaps in vector

Fri Dec 21 23:51:27 CET 2007

If we don't need any plotting we don't really need rect.hclust at
all.  Split the output of cutree, instead.  Continuing from the
prior code:

> for(el in split(unname(vv), names(vv))) print(el)
[1] 0.00 0.45
[1] 1
[1] 2
[1] 3.00 3.25 3.33 3.75 4.10
[1] 5
[1] 6.00 6.45
[1] 7.0 7.1
[1] 8

On Dec 21, 2007 3:24 PM, Johannes Graumann <johannes_graumann at web.de> wrote:
> Hm, hm, rect.hclust doesn't accept "plot=FALSE" and cutree doesn't retain
> the indexes of membership ... anyway short of ripping out the guts of
> rect.hclust to achieve the same result without an active graphics device?
>
> Joh
>
>
> >> # cluster and plot
> >> hc <- hclust(dist(v), method = "single")
> >> plot(hc, lab = v)
> >> cl <- rect.hclust(hc, h = .5, border = "red")
> >>
> >> # each component of list cl is one cluster.  Print them out.
> >> for(idx in cl) print(unname(v[idx]))
> > [1] 8
> > [1] 7.0 7.1
> > [1] 6.00 6.45
> > [1] 5
> > [1] 3.00 3.25 3.33 3.75 4.10
> > [1] 2
> > [1] 1
> > [1] 0.00 0.45
> >
> >> # a different representation of the clusters
> >> vv <- v
> >> names(vv) <- ct <- cutree(hc, h = .5)
> >> vv
> >    1    1    2    3    4    4    4    4    4    5    6    6    7    7    8
> > 0.00 0.45 1.00 2.00 3.00 3.25 3.33 3.75 4.10 5.00 6.00 6.45 7.00 7.10 8.00
> >
> >
> > On Dec 21, 2007 4:56 AM, Johannes Graumann <johannes_graumann at web.de>
> > wrote:
> >> <posted & mailed>
> >>
> >> Dear all,
> >>
> >> I'm trying to solve the problem, of how to find clusters of values in a
> >> vector that are closer than a given value. Illustrated this might look as
> >> follows:
> >>
> >> vector <- c(0,0.45,1,2,3,3.25,3.33,3.75,4.1,5,6,6.45,7,7.1,8)
> >>
> >> When using '0.5' as the proximity requirement, the following groups would
> >> result:
> >> 0,0.45
> >> 3,3.25,3.33,3.75,4.1
> >> 6,6.45
> >> 7,7.1
> >>
> >> Jim Holtman proposed a very elegant solution in
> >> http://tolstoy.newcastle.edu.au/R/e2/help/07/07/21286.html, which I have
> >> modified and perused since he wrote it to me. The beauty of this approach
> >> is that it will not only work for constant proximity requirements as
> >> above, but also for overlap-windows defined in terms of ppm around each
> >> value. Now I have an additional need and have found no way (short of
> >> iteratively step through all the groups returned) to figure out how to do
> >> that with Jim's approach: how to figure out that 6,6.45 and 7,7.1 are
> >> separate clusters?
> >>
> >> Thanks for any hints, Joh
> >>