[BioC] summarizeOverlaps mode ignoring inter feature overlaps

Thomas Girke thomas.girke at ucr.edu
Wed May 15 01:44:22 CEST 2013


Hi Varlerie,

Excellent! I really appreciate the effort implementing this
consolidated solution. I definitely will put it to good use in many
of our projects and teaching efforts.

Best,

Thomas

On Tue, May 14, 2013 at 09:21:44PM +0000, Valerie Obenchain wrote:
> Hi Thomas,
> 
> Two new args have been added to summarizeOverlaps(), 'inter.feature' and 
> 'fragments'. Available in GenomicRanges 1.13.11 and Rsamtools 1.13.13. 
> The ?summarizeOverlaps page in GenomicRanges now has all examples (vs 
> having half in GenomicRanges, half in Rsamtools).
> 
> 'inter.feature':
> When TRUE (default) counting is as it always was - reads that hit 
> multiple features are resolved with one of the modes or dropped. When 
> FALSE, each feature that a read hits get a count. This essentially boils 
> down to countOverlaps() with type="any" (Union and IntersectionNotEmpty) 
> or type="within" (IntersectionStrict).
> 
> 'fragments':
> This argument is relevant to counting paired-end Bam files. It was added 
> because of the flexibility the GAlignmentsList class offers. The 
> familiar GAlignmentPairs class holds reads that have been "properly 
> mated" with the algorithm in ?findMateAlignment. GAlignmentsList can 
> hold these "properly mated" reads as well the singletons, reads with 
> unmapped pairs and any others in the Bam.
> 
> When TRUE (default), "properly mated" and others, are counted. You can 
> of course still add your own filtering / QC with
> param = ScanBamParam(). When FALSE, only reads that have been "properly 
> mated" will be counted.
> 
> 
> Let me know how it goes.
> Valerie
> 
> 
> 
> On 04/08/13 17:52, Thomas Girke wrote:
> > Dear Valerie,
> >
> > Is there currently any way to run summarizeOverlaps in a feature-overlap
> > unaware mode, e.g with an ignorefeatureOL=FALSE/TRUE setting? Currently,
> > one can switch back to countOverlaps when feature overlap unawareness is
> > the more appropriate counting mode for a biological question, but then
> > double counting of reads mapping to multiple-range features is not
> > accounted for. It would be really nice to have such a feature-overlap
> > unaware option directly in summarizeOverlaps.
> >
> > Another question relates to the memory usage of summarizeOverlaps. Has
> > this been optimized yet? On a typical bam file with ~50-100 million
> > reads the memory usage of summarizeOverlaps is often around 10-20GB. To
> > use the function on a desktop computer or in large-scale RNA-Seq
> > projects on a commodity compute cluster, it would be desirable if every
> > counting instance would consume not more than 5GB of RAM.
> >
> > Thanks in advance for your help and suggestions,
> >
> > Thomas
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
>



More information about the Bioconductor mailing list