[BioC] IRanges: Trying to cut overlapping intervals into pieces

Elizabeth Purdom epurdom at stat.berkeley.edu
Sun Jan 25 07:37:53 CET 2009


Hi Michael,
I think for what I want to do, actually, you need some extra sorts and 
uniques in the call to head and tail to make these calls completely 
remove the first/last positions -- they might be replicated (this is 
true even if you have unique intervals). This wasn't the case in the 
example I sent you, but I hit the error when I tried to run it over my 
true data.

For example:
 > ir <- IRanges(c(1,2,3,4),c(7,5,5,7))
 > adj <- IRanges(sort(unique(c(start(ir), head(end(ir),-1)+1))), 
sort(unique(c(end(ir), tail(start(ir),-1)-1))))
Error in IRanges(sort(unique(c(start(ir), head(end(ir), -1) + 1))), 
sort(unique(c(end(ir),  :
   'start' and 'end' must have the same length
#instead:
 > newadj <- IRanges(sort(unique(c(start(ir), 
head(unique(sort(end(ir))),-1)+1))), sort(unique(c(end(ir), 
tail(unique(sort(start(ir))),-1)-1))))

Best,
Elizabeth

Michael Lawrence wrote:
> 
> 
> On Fri, Jan 23, 2009 at 10:24 PM, Elizabeth Purdom 
> <epurdom at stat.berkeley.edu <mailto:epurdom at stat.berkeley.edu>> wrote:
> 
>     Hi,
> 
>     I am trying to take overlapping intervals and return a set of
>     intervals that are not overlapping but cover all of the region (and
>     mantain the intervals that don't overlap). In particular, I don't
>     want to merge intervals that overlap together (i.e. the reduce
>     function in IRanges)-- I want to cut them up into distinct regions.
>     For example, if I have intervals:
>     [1,6], [4,8], [7,10]
>     I want to get back the set of adjacent intervals:
>     [1,3],[4,6],[7,8],[9,10]
> 
> 
> Well that's a fun one.
> 
> ir <- IRanges(c(1, 4, 7), c(6, 8, 10))
> adj <- IRanges(sort(unique(c(start(ir), head(end(ir),-1)+1))), 
> sort(unique(c(end(ir), tail(start(ir),-1)-1))))
> 
> ... is a not so nice one, but pretty fast..
>  
> But if you had a gap in those ranges, like:
> 
> ir <- IRanges(c(1, 4, 10), c(6, 8, 10))
> 
> So there's a gap at position 9, you would need an additional filtering step:
> 
> adj[adj %in% ir]
> 
> This last step requires the devel version of IRanges, but can be 
> emulated using !is.na <http://is.na>(overlap(ir, adj, multiple=FALSE)).
> 
> 
>     The options I find that look like they perhaps do this (intersect or
>     setdiff?) seem to be related to the 'normal' ranges class; but this
>     class requires a gap between intervals -- no adjacent intervals --
>     which is not what I want. Is there a nice way to do this with
>     IRanges (or a not so nice one, but fast)?
> 
> 
> The intersect and setdiff functions are for any Ranges, normal or not. 
> They return normal IRanges though. Perhaps the documentation does not 
> make this clear. They probably aren't very useful functions.
>  
> 
> 
>     Similarly, is there a 'reduce' version that doesn't merge adjacent
>     intervals but only truly overlapping ones? There are a lot of
>     annotation examples where you wouldn't not want to merge adjacent
>     intervals (e.g. UTRs)
> 
> 
> Try a trick like this:
> 
> ir2 <- IRanges(c(1, 5, 7), c(4, 6, 9))
> width(ir2) <- width(ir2) - 1
> rir2 <- reduce(ir2)
> width(rir2) <- width(rir2) + 1
> 
> Or find the overlap, reduce those that did overlap and combine that 
> result with those that did not overlap.
>  
> 
> 
>     Thanks for any assistance!
> 
> 
> Thanks for providing more use cases. We'll consider adding functionality 
> along these lines to the base package (actually the reduce one has been 
> on the TODO list for many months).
>  
> 
> 
>     Elizabeth Purdom
>     Division of Biostatistics
>     UC, Berkeley
> 
>     _______________________________________________
>     Bioconductor mailing list
>     Bioconductor at stat.math.ethz.ch <mailto:Bioconductor at stat.math.ethz.ch>
>     https://stat.ethz.ch/mailman/listinfo/bioconductor
>     Search the archives:
>     http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
>



More information about the Bioconductor mailing list