[BioC] Obtain overlap coordinates in GenomicRanges findSpliceOverlaps
MEC at stowers.org
Fri Mar 21 18:56:30 CET 2014
+1 for pmap!
I like the separation of concerns this would offer.
I seems to me that the combination of pmap and findSpliceOverlaps should afford a more general solution to the problem solved by VariantAnnotation:: refLocsToLocalLocs (and should be equally performant?).
>From: bioconductor-bounces at r-project.org [mailto:bioconductor-bounces at r-project.org] On Behalf Of Michael Lawrence
>Sent: Friday, March 21, 2014 12:17 PM
>To: rubi [guest]
>Cc: GenomicRanges Maintainer; bioconductor at r-project.org; nimrod.rubinstein at gmail.com
>Subject: Re: [BioC] Obtain overlap coordinates in GenomicRanges findSpliceOverlaps
>Currently there is
>m <- map(granges, grangeslist)
>Where 'm' is a RangesMapping indicating the within overlaps (Hits) and the
>mapped ranges. You would get the granges from the GAlignments with the
>granges() function. The problem is that the overlap computation uses
>findOverlaps(type="within") instead of findSpliceOverlaps. One idea would
>be to take a Hits object as an optional argument. Or, we could add a "pmap"
>method that would assume the from and to are matched up already and simply
>perform the mapping.
>One quick fix would be to create a granges that consists a width-1 range at
>the start position (and likewise the end position) for each read and pass
>it to map() as above. Then filter the mappings based on the compatibility
>results from findSpliceOverlaps(). Not that pretty nor very efficient but
>it takes care of the nasty stuff.
>On Fri, Mar 21, 2014 at 9:44 AM, rubi [guest] <guest at bioconductor.org>wrote:
>> I was wondering whether it is possible in anyway to obtain the overlap
>> coordinates when intersecting GAlignments objects as query with a
>> GRangesList object, using the findSpliceOverlaps function?
>> Specifically, I would like to obtain the transcriptomic coordinates of the
>> GAlignments in the transcripts that they compatibly intersect with.
>> Right now I'm obtaining this information in a 2 step approach:
>> 1. findSpliceOverlaps(GAlignments, GRangesList, ignore.strand=FALSE)
>> 2. Keeping only the hits that are compatible, I then intersect again each
>> GAlignment and the ranges of the compatible GRange transcript and sum the
>> widths of the exons up to the intersection coordinate.
>> My problem is that the second step is extremely slow.
>> I'd be grateful for some discussion
>> -- output of sessionInfo():
>> R version 3.0.2 (2013-09-25)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>  LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>  LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>  LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>>  LC_PAPER=en_US.UTF-8 LC_NAME=C
>>  LC_ADDRESS=C LC_TELEPHONE=C
>>  LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>> attached base packages:
>>  parallel stats graphics grDevices utils datasets methods
>>  base
>> other attached packages:
>>  hash_2.2.6 data.table_1.8.10 Rsamtools_1.14.3
>>  Biostrings_2.30.1 GenomicRanges_1.14.4 XVector_0.2.0
>>  IRanges_1.20.6 BiocGenerics_0.8.0
>> loaded via a namespace (and not attached):
>>  bitops_1.0-6 stats4_3.0.2 tools_3.0.2 zlibbioc_1.8.0
>> Sent via the guest posting facility at bioconductor.org.
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> Search the archives:
> [[alternative HTML version deleted]]
>Bioconductor mailing list
>Bioconductor at r-project.org
>Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor