[BioC] find overlap of bed files of different length

Steve Lianoglou mailinglist.honeypot at gmail.com
Tue Feb 1 21:11:17 CET 2011


On Tue, Feb 1, 2011 at 2:08 PM, Michael Lawrence
<lawrence.michael at gene.com> wrote:
[snip]
>> My task is to count the reads of a bed file of different length in exons of
>> genes with a controllable overlap option (by percentage, not by bases). Some
>> people want to count it with overlap=100% length of reads, but some other
>> might want to count it with 20% for example. This option should be very
>> similar to minOverlap, but in percentage instead of bases.
>>
>>
> This is a reasonable request. As Kasper mentioned, it's possible with post
> processing.
>
> E.g.:
>
> m <- findOverlaps(query, subject)
> percentOverlap <- width(ranges(m, query, subject)) /
> width(query)[queryHits(m)]
> keep <- percentOverlap > cutoff
>
> Perhaps someone up North could add this to IRanges/GenomicRanges?
[/snip]

Hah! Very smooth!

Now we just need the findOverlaps::select parameter to take
"{max|min}Overlap" as a valid value and we can call it a day ... :-)

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list