[R] Finding overlaps in vector

Johannes Graumann johannes_graumann at web.de
Sat Dec 22 21:16:28 CET 2007


Enlightening. Thanks.

Joh

Gabor Grothendieck wrote:

> If you want indexes, i.e. 1, 2, 3, ... instead of the values in v you
> can still use split -- just split on seq_along(v) instead of v (or if
> v had names you might want to split along names(v)):
> 
> split(seq_along(v), ct)
> 
> and if you only want to retain groups with 2+ elements then
> you can just Filter then out:
> 
> twoplus <- function(x) length(x) >= 2
> Filter(twoplus, split(seq_along(v), ct))
> 
> On Dec 22, 2007 5:12 AM, Johannes Graumann <johannes_graumann at web.de>
> wrote:
>> But cutree does away with the indexes from the original input, which
>> rect.hclust retains.
>> I will have no other choice and match that input with the 'values'
>> contained in the clusters ...
>>
>> Joh
>>
>>
>> Gabor Grothendieck wrote:
>>
>> > If we don't need any plotting we don't really need rect.hclust at
>> > all.  Split the output of cutree, instead.  Continuing from the
>> > prior code:
>> >
>> >> for(el in split(unname(vv), names(vv))) print(el)
>> > [1] 0.00 0.45
>> > [1] 1
>> > [1] 2
>> > [1] 3.00 3.25 3.33 3.75 4.10
>> > [1] 5
>> > [1] 6.00 6.45
>> > [1] 7.0 7.1
>> > [1] 8
>> >
>> > On Dec 21, 2007 3:24 PM, Johannes Graumann <johannes_graumann at web.de>
>> > wrote:
>> >> Hm, hm, rect.hclust doesn't accept "plot=FALSE" and cutree doesn't
>> >> retain the indexes of membership ... anyway short of ripping out the
>> >> guts of rect.hclust to achieve the same result without an active
>> >> graphics device?
>> >>
>> >> Joh
>> >>
>> >>
>> >> >> # cluster and plot
>> >> >> hc <- hclust(dist(v), method = "single")
>> >> >> plot(hc, lab = v)
>> >> >> cl <- rect.hclust(hc, h = .5, border = "red")
>> >> >>
>> >> >> # each component of list cl is one cluster.  Print them out.
>> >> >> for(idx in cl) print(unname(v[idx]))
>> >> > [1] 8
>> >> > [1] 7.0 7.1
>> >> > [1] 6.00 6.45
>> >> > [1] 5
>> >> > [1] 3.00 3.25 3.33 3.75 4.10
>> >> > [1] 2
>> >> > [1] 1
>> >> > [1] 0.00 0.45
>> >> >
>> >> >> # a different representation of the clusters
>> >> >> vv <- v
>> >> >> names(vv) <- ct <- cutree(hc, h = .5)
>> >> >> vv
>> >> >    1    1    2    3    4    4    4    4    4    5    6    6    7   
>> >> >    7
>> >> >     8
>> >> > 0.00 0.45 1.00 2.00 3.00 3.25 3.33 3.75 4.10 5.00 6.00 6.45 7.00
>> >> > 7.10 8.00
>> >> >
>> >> >
>> >> > On Dec 21, 2007 4:56 AM, Johannes Graumann
>> >> > <johannes_graumann at web.de> wrote:
>> >> >> <posted & mailed>
>> >> >>
>> >> >> Dear all,
>> >> >>
>> >> >> I'm trying to solve the problem, of how to find clusters of values
>> >> >> in a vector that are closer than a given value. Illustrated this
>> >> >> might look as follows:
>> >> >>
>> >> >> vector <- c(0,0.45,1,2,3,3.25,3.33,3.75,4.1,5,6,6.45,7,7.1,8)
>> >> >>
>> >> >> When using '0.5' as the proximity requirement, the following groups
>> >> >> would result:
>> >> >> 0,0.45
>> >> >> 3,3.25,3.33,3.75,4.1
>> >> >> 6,6.45
>> >> >> 7,7.1
>> >> >>
>> >> >> Jim Holtman proposed a very elegant solution in
>> >> >> http://tolstoy.newcastle.edu.au/R/e2/help/07/07/21286.html, which I
>> >> >> have modified and perused since he wrote it to me. The beauty of
>> >> >> this approach is that it will not only work for constant proximity
>> >> >> requirements as above, but also for overlap-windows defined in
>> >> >> terms of ppm around each value. Now I have an additional need and
>> >> >> have found no way (short of iteratively step through all the groups
>> >> >> returned) to figure out how to do that with Jim's approach: how to
>> >> >> figure out that 6,6.45 and 7,7.1 are separate clusters?
>> >> >>
>> >> >> Thanks for any hints, Joh
>> >> >>
>> >
>>
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html and provide commented,
>> > minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html and provide commented,
>> minimal, self-contained, reproducible code.
>>
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented,
> minimal, self-contained, reproducible code.



More information about the R-help mailing list