[BioC] find overlap of bed files of different length

Kasper Daniel Hansen kasperdanielhansen at gmail.com
Mon Jan 31 19:20:47 CET 2011


Use findOverlaps to find all cases.  This is usually the hard and big
computation.  Then use for example pintersect() to compute the actual
overlap in percent.  There might be some tedious coding involved.

Kasper

On Mon, Jan 31, 2011 at 10:30 AM, Duke <duke.lists at gmx.com> wrote:
> On 1/30/11 9:34 AM, Martin Morgan wrote:
>>
>> On 01/29/2011 04:33 PM, Duke wrote:
>>>
>>> Hi all,
>>>
>>> I need to find overlap between a text file (BED format) and a gene
>>> reference. The BED file contains sequence of different lengths, and I
>>> need to find all the sequences that lye inside the gene (meaning
>>> overlapping percentage is 100%). I found findOverlaps function in
>>> GenomicRanges, but the parameter to control overlap (minoverlap) does
>>> not let me control percentage.
>>
>> the 'tyoe='within"' argument is available for
>> findOverlaps,IRanges,IRanges-method; you could use this by extracting
>> the ranges(gr) from your query / subject for each seqname / strand
>> subset you were interested in.
>>
>> The development version of GenomicRanges also now supports
>> findOverlaps,GenomicRanges,GenomicRangaes-method, so using the
>> development version of R is also a solution.
>
> Thanks Martin for your suggestion. After posting the question, I also found
> out findOverlaps for IRanges method has type="within". Unfortunately
> "within" is just one case that I want to make it to work. What I really want
> is to control the overlap percentage (quite similar to minOverlap, but in
> percentage). Does the development version of GenomicRanges support that? Or
> do you know of any other packages supporting percentage overlap?
>
> Thanks,
>
> D.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list