[BioC] summarizeOverlaps mode ignoring inter feature overlaps

Valerie Obenchain vobencha at fhcrc.org
Tue Apr 9 17:42:03 CEST 2013


Hi Thomas,

On 04/08/2013 05:52 PM, Thomas Girke wrote:
> Dear Valerie,
>
> Is there currently any way to run summarizeOverlaps in a feature-overlap
> unaware mode, e.g with an ignorefeatureOL=FALSE/TRUE setting? Currently,
> one can switch back to countOverlaps when feature overlap unawareness is
> the more appropriate counting mode for a biological question, but then
> double counting of reads mapping to multiple-range features is not
> accounted for. It would be really nice to have such a feature-overlap
> unaware option directly in summarizeOverlaps.

No, we don't currently have an option to ignore feature-overlap. It 
sounds like you want to count with countOverlaps() but still want the 
double counting to be resolved. This is essentially what the other modes 
are doing so I must be missing something.

In this example 2 reads hit feature A, 1 read hits feature B. With 
something like ignorefeature0L=TRUE, what results would you expect to 
see? Maybe you have another, more descriptive example?

reads <- GRanges("chr1", IRanges(c(1, 5, 20), width=3))
features <- GRanges("chr1", IRanges(c(1, 20), width=10,
                     names=c("A", "B")))

 > countOverlaps(features, reads)
[1] 2 1


>
> Another question relates to the memory usage of summarizeOverlaps. Has
> this been optimized yet? On a typical bam file with ~50-100 million
> reads the memory usage of summarizeOverlaps is often around 10-20GB. To
> use the function on a desktop computer or in large-scale RNA-Seq
> projects on a commodity compute cluster, it would be desirable if every
> counting instance would consume not more than 5GB of RAM.

Have you tried the BamFileList-method? There is an example at the bottom 
of the ?BamFileList man page using summarizeOverlaps(). As Ryan 
mentioned, the key is to set the 'yieldSize' parameter when creating the 
BamFile. This method also makes use of mclapply().

Valerie

>
> Thanks in advance for your help and suggestions,
>
> Thomas
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list