[BioC] Obtain overlap coordinates in GenomicRanges findSpliceOverlaps
MEC at stowers.org
Fri Mar 21 23:27:04 CET 2014
I think I have a possible workaround for you involving use of
- pintersect : to figure out the regions of compatible overlap
- restrict : to find the left and right regions in your transcript models that are outside of the overlapping region
But can you send a test case?
From: nimrod.rubinstein [mailto:nimrod.rubinstein at gmail.com]
Sent: Friday, March 21, 2014 4:16 PM
To: Michael Lawrence
Cc: Cook, Malcolm; rubi [guest]; GenomicRanges Maintainer; bioconductor at r-project.org
Subject: Re: [BioC] Obtain overlap coordinates in GenomicRanges findSpliceOverlaps
I see. So I assume the p in pmap stands for paired?.
Any ballpark as to when this implementation will be added?
On Fri, Mar 21, 2014 at 3:59 PM, Michael Lawrence <lawrence.michael at gene.com> wrote:
Yea, pmap() would do ranges[x] to grangeslist[x] but pmap() does not exist yet. map() is all by all. That's the downside of it.
On Fri, Mar 21, 2014 at 11:49 AM, nimrod.rubinstein <nimrod.rubinstein at gmail.com> wrote:
I guess I thought that map only maps ranges[x] with grangeslist[x] for every x. Do I understand you correctly that it rather maps all ranges against all grangeslist?
On Fri, Mar 21, 2014 at 2:39 PM, Michael Lawrence <lawrence.michael at gene.com> wrote:
On Fri, Mar 21, 2014 at 11:29 AM, nimrod.rubinstein <nimrod.rubinstein at gmail.com> wrote:
Thanks for the help.
Correct me if I'm wrong but it seems that I first intersect the GAlignments with the GRangesList using the findSpliceOverlaps function, and then run the map function where the granges are of the compatible GAlignments and grangeslist is the corresponding list of GRanges from GRangesList.
That will not quite work, you will always have to filter the results from the map() call, because it may try to map things that are not compatible.
On Fri, Mar 21, 2014 at 2:20 PM, Michael Lawrence <lawrence.michael at gene.com> wrote:
On Fri, Mar 21, 2014 at 10:56 AM, Cook, Malcolm <MEC at stowers.org> wrote:
+1 for pmap!
I like the separation of concerns this would offer.
I seems to me that the combination of pmap and findSpliceOverlaps should afford a more general solution to the problem solved by VariantAnnotation:: refLocsToLocalLocs (and should be equally performant?).
Yea, actually both map and refLocsToLocalLocs rely on the same underlying function for speed: GenomicRanges:::.listCumsumShifted (writing that one gave me a headache).
Unfortunately I don't have the time to spend on things like pmap but I would encourage someone in Seattle to take it on. There's already a method for Ranges,GAlignments but that's the opposite direction as requested in this thread. I write these things as they come up in my work.
>From: bioconductor-bounces at r-project.org [mailto:bioconductor-bounces at r-project.org] On Behalf Of Michael Lawrence
>Sent: Friday, March 21, 2014 12:17 PM
>To: rubi [guest]
>Cc: GenomicRanges Maintainer; bioconductor at r-project.org; nimrod.rubinstein at gmail.com
>Subject: Re: [BioC] Obtain overlap coordinates in GenomicRanges findSpliceOverlaps
>Currently there is
>m <- map(granges, grangeslist)
>Where 'm' is a RangesMapping indicating the within overlaps (Hits) and the
>mapped ranges. You would get the granges from the GAlignments with the
>granges() function. The problem is that the overlap computation uses
>findOverlaps(type="within") instead of findSpliceOverlaps. One idea would
>be to take a Hits object as an optional argument. Or, we could add a "pmap"
>method that would assume the from and to are matched up already and simply
>perform the mapping.
>One quick fix would be to create a granges that consists a width-1 range at
>the start position (and likewise the end position) for each read and pass
>it to map() as above. Then filter the mappings based on the compatibility
>results from findSpliceOverlaps(). Not that pretty nor very efficient but
>it takes care of the nasty stuff.
>On Fri, Mar 21, 2014 at 9:44 AM, rubi [guest] <guest at bioconductor.org>wrote:
>> I was wondering whether it is possible in anyway to obtain the overlap
>> coordinates when intersecting GAlignments objects as query with a
>> GRangesList object, using the findSpliceOverlaps function?
>> Specifically, I would like to obtain the transcriptomic coordinates of the
>> GAlignments in the transcripts that they compatibly intersect with.
>> Right now I'm obtaining this information in a 2 step approach:
>> 1. findSpliceOverlaps(GAlignments, GRangesList, ignore.strand=FALSE)
>> 2. Keeping only the hits that are compatible, I then intersect again each
>> GAlignment and the ranges of the compatible GRange transcript and sum the
>> widths of the exons up to the intersection coordinate.
>> My problem is that the second step is extremely slow.
>> I'd be grateful for some discussion
>> -- output of sessionInfo():
>> R version 3.0.2 (2013-09-25)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>  LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>  LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>  LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>>  LC_PAPER=en_US.UTF-8 LC_NAME=C
>>  LC_ADDRESS=C LC_TELEPHONE=C
>>  LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>> attached base packages:
>>  parallel stats graphics grDevices utils datasets methods
>>  base
>> other attached packages:
>>  hash_2.2.6 data.table_1.8.10 Rsamtools_1.14.3
>>  Biostrings_2.30.1 GenomicRanges_1.14.4 XVector_0.2.0
>>  IRanges_1.20.6 BiocGenerics_0.8.0
>> loaded via a namespace (and not attached):
>>  bitops_1.0-6 stats4_3.0.2 tools_3.0.2 zlibbioc_1.8.0
>> Sent via the guest posting facility at bioconductor.org.
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> Search the archives:
> [[alternative HTML version deleted]]
>Bioconductor mailing list
>Bioconductor at r-project.org
>Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor