[BioC] find overlap of bed files of different length

Steve Lianoglou mailinglist.honeypot at gmail.com
Tue Feb 8 18:05:38 CET 2011


Hi,

On Tue, Feb 8, 2011 at 11:59 AM, Thomas Girke <thomas.girke at ucr.edu> wrote:
> What about a more generic solution to this very reasonable utility
> request by returning a somewhat complete overlap mapping result in a
> GRanges object that contains overlap start/end positions,
> percent/absolute overlap, overlap types, etc. The relative overlap
> filtering by percent values can then be performed very easily in a
> second step, like in this example:
>
> ## Two sample GRanges objects
> library(GenomicRanges)
> grq <- GRanges(seqnames = Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
>               ranges = IRanges(seq(1, 100, by=10), end = seq(30, 120, by=10)),
>               strand = Rle(strand(c("-", "+", "-")), c(1, 7, 2)))
> grs <- shift(grq[c(2,5,6)], 5)
>
> ## Return absolute and relative overlap positions with olRanges function
> source("http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/rangeoverlapper.R")
> myol <- olRanges(query=grq, subject=grs, output="gr") # output="df" returns data frame
> myol
>
> ## Return overlaps that cover at least 50% of the query/subject ranges.
> myol[elementMetadata(myol)[, "OLpercQ"] > 50]
> myol[elementMetadata(myol)[, "OLpercS"] > 50]
>
>
> This works with both GRanges or IRanges objects as input. Wouldn't something like this
> be a nice "convenience output" argument for the findOverlap function???

After Michael's original input as to how to solve the problem, I added
"my own" `quantifyOverlaps` function to my utility-belt which does
much of what you suggest (really, it just returns an augmented
matchMatrix).

If the devs are opposed to adding more and more arguments to
findOverlaps, perhaps a quanitfyOverlaps (or similar named function)
that explicitly does this might be a more palatable alternative.

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list