[BioC] summarizeOverlaps: number of dropped reads

Valerie Obenchain vobencha at fhcrc.org
Wed Apr 23 18:46:53 CEST 2014

Hi Gabriele,

There are two approaches you can take.

1) Reverse the role of 'reads' and 'features' in summarizeOverlaps()

If your reads are in an R object (not bam file)  you can possibly 
reverse the role of the arguments. Check 
showMethods("summarizeOverlaps") to confirm a method exists for your 
object types.

summarizeOverlaps() reports overlaps in terms of the annotation, i.e., 
the return count vector is the same length as the 'features' arg. To 
answer your question we need overlap counts in terms of the reads. When 
you call summarizeOverlaps() provide your reads as the 'features' arg 
and vice versa. Use 'mode' Union with 'inter.feature=FALSE'. This will 
return all counts like a straightforward countOverlaps().

summarizeOverlaps(features=my_reads, reads=my_annotation,
                   mode=Union, inter.feature=FALSE)

The count result is the same length as my_reads. Reads with a count >1 
are those that hit multiple features. From this output you can also 
distinguish the reads with no hits.

2) countOverlaps()

Alternatively you can just use countOverlaps().

If reads are in a GRanges / GAlignments:

   co <- countOverlaps(my_reads, my_annotation)

As with the summarizeOverlaps() example, the return vector is the same 
length as my_reads and counts the number of times that read was 
overlapped. You want the reads with counts > 1.

   my_reads[co > 1]

or just

   sum(co > 1)

If reads are in a bam file use readGAlignments() for single-end and 
readGAlignmentPairs() or readGAlignmentsList() for paired-end.

   my_reads <- readGAlignments(pathToBam)

then countOverlaps() as above.


On 04/23/2014 06:43 AM, Gabriele Schweikert wrote:
> Hello,
> I'm using summarizeOverlaps with mode=union and inter.feature=TRUE to
> count reads that overlap annotated features. This is great and exactly
> what I need. However, I would also like to know how many reads are
> dropped in this setting because they map to multiple features (as
> opposed to reads not mapping to any of the annotated features.) Is there
> an easy way to get this number?
> I was thinking of counting again, this time with inter.feature=FALSE and
> subtracting the results, but this is not quite what I want because
> multiple reads are then also counted multiple times.
> Thanks for the assistance
> best wishes
> Gabriele

Valerie Obenchain
Program in Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, Seattle, WA 98109

Email: vobencha at fhcrc.org
Phone: (206) 667-3158

More information about the Bioconductor mailing list