[BioC] find overlap of bed files of different length

Duke duke.lists at gmx.com
Tue Feb 1 19:31:37 CET 2011


On 2/1/11 12:07 PM, Kasper Daniel Hansen wrote:
> Well, clearly I have not done it, but I would expect that a decent
> implementation of my method would take less than 2 minutes (although
> it depends on length of the stuff in the BED file you started with).
> At least the computational load should not be much more than running
> findOverlaps.

I definitely want to solve my problem using R, but given that I am still 
new to R and that I have anlysis to be done, and that I need something 
that get the job done quick (that was why I decided to go for R with the 
hope that some bioconductor packages would help), I got it done with C++ 
first. As soon as I have a more time to spend, I will try to make it to 
work with R.

D.

> Kasper
>
> On Tue, Feb 1, 2011 at 10:06 AM, Duke<duke.lists at gmx.com>  wrote:
>> On 1/31/11 1:20 PM, Kasper Daniel Hansen wrote:
>>> Use findOverlaps to find all cases.  This is usually the hard and big
>>> computation.  Then use for example pintersect() to compute the actual
>>> overlap in percent.  There might be some tedious coding involved.
>> Thanks for your suggestion Kasper, though honestly I have not tried it yet.
>> But based on what Martin and you suggested, I thought the final code will
>> not run fast because of extracting to strand/subset and running each.
>> Especially my task is a little more complicated: I need to find gene
>> expressions (counting sequences in exonic regions of each gene). I also gave
>> BEDTools a try, but it does not fulfil my needs (extremely slow for a gene
>> list of 28k).
>>
>> I ended up with coding a c++ code to do the job. Thanks for all of your
>> suggestions and helps guys.
>>
>> D.
>>



More information about the Bioconductor mailing list