[BioC] summarizeOverlaps: number of dropped reads
vobencha at fhcrc.org
Wed Apr 23 18:46:53 CEST 2014
There are two approaches you can take.
1) Reverse the role of 'reads' and 'features' in summarizeOverlaps()
If your reads are in an R object (not bam file) you can possibly
reverse the role of the arguments. Check
showMethods("summarizeOverlaps") to confirm a method exists for your
summarizeOverlaps() reports overlaps in terms of the annotation, i.e.,
the return count vector is the same length as the 'features' arg. To
answer your question we need overlap counts in terms of the reads. When
you call summarizeOverlaps() provide your reads as the 'features' arg
and vice versa. Use 'mode' Union with 'inter.feature=FALSE'. This will
return all counts like a straightforward countOverlaps().
The count result is the same length as my_reads. Reads with a count >1
are those that hit multiple features. From this output you can also
distinguish the reads with no hits.
Alternatively you can just use countOverlaps().
If reads are in a GRanges / GAlignments:
co <- countOverlaps(my_reads, my_annotation)
As with the summarizeOverlaps() example, the return vector is the same
length as my_reads and counts the number of times that read was
overlapped. You want the reads with counts > 1.
my_reads[co > 1]
sum(co > 1)
If reads are in a bam file use readGAlignments() for single-end and
readGAlignmentPairs() or readGAlignmentsList() for paired-end.
my_reads <- readGAlignments(pathToBam)
then countOverlaps() as above.
On 04/23/2014 06:43 AM, Gabriele Schweikert wrote:
> I'm using summarizeOverlaps with mode=union and inter.feature=TRUE to
> count reads that overlap annotated features. This is great and exactly
> what I need. However, I would also like to know how many reads are
> dropped in this setting because they map to multiple features (as
> opposed to reads not mapping to any of the annotated features.) Is there
> an easy way to get this number?
> I was thinking of counting again, this time with inter.feature=FALSE and
> subtracting the results, but this is not quite what I want because
> multiple reads are then also counted multiple times.
> Thanks for the assistance
> best wishes
Program in Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, Seattle, WA 98109
Email: vobencha at fhcrc.org
Phone: (206) 667-3158
More information about the Bioconductor