[BioC] Obtain overlap coordinates in GenomicRanges findSpliceOverlaps

Cook, Malcolm MEC at stowers.org
Fri Mar 21 18:56:30 CET 2014


+1 for pmap!

I like the separation of concerns this would offer.

I seems to me that the combination of pmap and findSpliceOverlaps should afford a more general solution to the problem solved by VariantAnnotation:: refLocsToLocalLocs  (and  should be equally performant?).


 >-----Original Message-----
 >From: bioconductor-bounces at r-project.org [mailto:bioconductor-bounces at r-project.org] On Behalf Of Michael Lawrence
 >Sent: Friday, March 21, 2014 12:17 PM
 >To: rubi [guest]
 >Cc: GenomicRanges Maintainer; bioconductor at r-project.org; nimrod.rubinstein at gmail.com
 >Subject: Re: [BioC] Obtain overlap coordinates in GenomicRanges findSpliceOverlaps
 >Currently there is
 >m <- map(granges, grangeslist)
 >Where 'm' is a RangesMapping indicating the within overlaps (Hits) and the
 >mapped ranges. You would get the granges from the GAlignments with the
 >granges() function. The problem is that the overlap computation uses
 >findOverlaps(type="within") instead of findSpliceOverlaps. One idea would
 >be to take a Hits object as an optional argument. Or, we could add a "pmap"
 >method that would assume the from and to are matched up already and simply
 >perform the mapping.
 >One quick fix would be to create a granges that consists a width-1 range at
 >the start position (and likewise the end position) for each read and pass
 >it to map() as above. Then filter the mappings based on the compatibility
 >results from findSpliceOverlaps(). Not that pretty nor very efficient but
 >it takes care of the nasty stuff.
 >On Fri, Mar 21, 2014 at 9:44 AM, rubi [guest] <guest at bioconductor.org>wrote:
 >> Hi,
 >> I was wondering whether it is possible in anyway to obtain the overlap
 >> coordinates when intersecting GAlignments objects as query with a
 >> GRangesList object, using the findSpliceOverlaps function?
 >> Specifically, I would like to obtain the transcriptomic coordinates of the
 >> GAlignments in the transcripts that they compatibly intersect with.
 >> Right now I'm obtaining this information in a 2 step approach:
 >> 1. findSpliceOverlaps(GAlignments, GRangesList, ignore.strand=FALSE)
 >> 2. Keeping only the hits that are compatible, I then intersect again each
 >> GAlignment and the ranges of the compatible GRange transcript and sum the
 >> widths of the exons up to the intersection coordinate.
 >> My problem is that the second step is extremely slow.
 >> I'd be grateful for some discussion
 >>  -- output of sessionInfo():
 >> R version 3.0.2 (2013-09-25)
 >> Platform: x86_64-unknown-linux-gnu (64-bit)
 >> locale:
 >>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 >>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 >>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 >>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
 >> attached base packages:
 >> [1] parallel  stats     graphics  grDevices utils     datasets  methods
 >> [8] base
 >> other attached packages:
 >> [1] hash_2.2.6           data.table_1.8.10    Rsamtools_1.14.3
 >> [4] Biostrings_2.30.1    GenomicRanges_1.14.4 XVector_0.2.0
 >> [7] IRanges_1.20.6       BiocGenerics_0.8.0
 >> loaded via a namespace (and not attached):
 >> [1] bitops_1.0-6   stats4_3.0.2   tools_3.0.2    zlibbioc_1.8.0
 >> --
 >> Sent via the guest posting facility at bioconductor.org.
 >> _______________________________________________
 >> Bioconductor mailing list
 >> Bioconductor at r-project.org
 >> https://stat.ethz.ch/mailman/listinfo/bioconductor
 >> Search the archives:
 >> http://news.gmane.org/gmane.science.biology.informatics.conductor
 >	[[alternative HTML version deleted]]
 >Bioconductor mailing list
 >Bioconductor at r-project.org
 >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

More information about the Bioconductor mailing list