[BioC] summarizeOverlaps question

James W. MacDonald jmacdon at uw.edu
Wed Aug 29 18:39:39 CEST 2012


I am getting results from summarizeOverlaps(<GrangesList>, 
<BamFileList>) that I don't understand. Here is the way I normally use 
the function:

library(Rsamtools)
library(TxDb.Mmusculus.UCSC.mm9.knownGene)
ex <- exonsBy(TxDb.Mmusculus.UCSC.mm9.knownGene, use.names = TRUE)
samps_casava <- BamFileList(<path to bamfiles>)
olaps <- summarizeOverlaps(ex, samps_casava, mode = "IntersectionNotEmpty")

and then as an example, check the counts for uc007amp.2

 > assays(olaps)$counts["uc007amp.2",, drop=F]
            GG1_ATCACG_L002.uniques.sorted.bam
uc007amp.2                                  1
            GG2_TTAGGC_L002.uniques.sorted.bam
uc007amp.2                                  1
            GG3_ACTTGA_L002.uniques.sorted.bam
uc007amp.2                                  0
            GG4_GATCAG_L002.uniques.sorted.bam
uc007amp.2                                  0
            GG5_TAGCTT_L003.uniques.sorted.bam
uc007amp.2                                  0
            GG6_GGCTAC_L003.uniques.sorted.bam
uc007amp.2                                  1
            GG7_GTGGCC_L003.uniques.sorted.bam
uc007amp.2                                  0
            GG8_GTTTCG_L003.uniques.sorted.bam
uc007amp.2                                  0
            GG9_CGTACG_L004.uniques.sorted.bam
uc007amp.2                                  0
            GG10_GAGTGG_L004.uniques.sorted.bam
uc007amp.2                                   0
            GG11_ACTGAT_L004.uniques.sorted.bam
uc007amp.2                                   1
            GG12_ATTCCT_L004.uniques.sorted.bam
uc007amp.2                                   3

However, this doesn't make any sense to me, as there are actually tons 
of reads that overlap this transcript (I know this from looking at the 
transcript and these data using IGV). So let's just summarize the 
overlaps for that transcript alone.

 > ex2 <- ex["uc007amp.2"]
 > olaps2 <- summarizeOverlaps(ex2, samps_casava, mode = 
"IntersectionNotEmpty")
 > assays(olaps2)$counts
            GG1_ATCACG_L002.uniques.sorted.bam
uc007amp.2                                 45
            GG2_TTAGGC_L002.uniques.sorted.bam
uc007amp.2                                 90
            GG3_ACTTGA_L002.uniques.sorted.bam
uc007amp.2                                 70
            GG4_GATCAG_L002.uniques.sorted.bam
uc007amp.2                                 64
            GG5_TAGCTT_L003.uniques.sorted.bam
uc007amp.2                                111
            GG6_GGCTAC_L003.uniques.sorted.bam
uc007amp.2                                 26
            GG7_GTGGCC_L003.uniques.sorted.bam
uc007amp.2                                 34
            GG8_GTTTCG_L003.uniques.sorted.bam
uc007amp.2                                 36
            GG9_CGTACG_L004.uniques.sorted.bam
uc007amp.2                                 98
            GG10_GAGTGG_L004.uniques.sorted.bam
uc007amp.2                                 153
            GG11_ACTGAT_L004.uniques.sorted.bam
uc007amp.2                                 164
            GG12_ATTCCT_L004.uniques.sorted.bam
uc007amp.2                                 174


Any ideas where I am going wrong?

Best,

Jim


-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list