[BioC] Problems Counting Reads with summarizeOverlaps

Valerie Obenchain vobencha at fhcrc.org
Sun Dec 8 19:31:05 CET 2013


summarizeOverlaps() does count paired-end reads. Use option 
singleEnd=FALSE. Details under 'singleEnd' in the 'Arguments' section on 
the man page.

library(GenomicAlignments) ## devel
library(GenomicRanges) ## release
?summarizeOverlaps

Valerie

On 12/08/13 09:42, Reema Singh wrote:
> Hi Sam,
>
> It depends on whether you are using single read or paired end reads
> files. As far as I known summarizeOverlaps function only working in case
> of single read. Anyone please correct me if I am wrong.
>
> Kind Regards
>
>
> On Sun, Dec 8, 2013 at 8:25 PM, Valerie Obenchain <vobencha at fhcrc.org
> <mailto:vobencha at fhcrc.org>> wrote:
>
>     Hi Sam,
>
>     Are you getting any error messages related to seqlevels when you
>     count? If you have confirmed that annotation seqlevels match the bam
>     files next look at the maximum possible overlap in a single bam file.
>
>        bf <- singleBamFile
>        reads <- readGAlignments(bf)  ## assuming single-end reads
>        so <- summarizeOverlaps(tx, reads, mode=Union, inter.feature=FALSE)
>
>     'inter.feature=FALSE' with 'mode=Union' is countOverlaps with
>     'type=ANY'. Reads that hit multiple features will be double counted
>     in this case. The idea to to see if there are any common regions at all.
>
>     If there are still no counts, you could look more closely at a
>     smaller chromosome region in 'reads' where you would expect counts.
>     This visual inspection might give you some insight as to why there
>     is no overlap.
>
>
>     Valerie
>
>
>     On 12/05/13 18:51, Sam McInturf wrote:
>
>         Hello Bioconductors,
>              I am working on an Arabidopsis RNA seq set, and after
>         executing my count
>         reads script, I get a count table with no reads counted (ie
>         sum(colSums(mat)) = 0).
>
>         I mapped my reads using Tophat2 (without supplying a gtf/gff
>         file) and used
>         samtools to sort and index my accepted_hits.bam files.  I loaded
>         these bam
>         files into IGV (integrated Genome Browser) and the reads appear
>         to have
>         been mapped (additionally the align_summary.txt says I have good
>         mapping).
>            Script and session info below.
>
>         Essentially I use Rsamtools to load in my bamfiles, and then
>         makeTranscriptsFromGFF to make a txdb object, transcriptsBy to get
>         transcripts, and then summarizeOverlaps to count the reads.  My
>         gff file is
>         TAIR10.  I am relatively sure that the genome I mapped to was
>         the TAIR10
>         release, but I am not 100% on that.
>
>         I have also used biomaRt to do this (commented out), but I had
>         the same
>         results.
>
>         I am currently upgrading to R 3.0.2 and re-installing
>         bioconductor (maybe
>         it will change something?)
>
>         Thanks for any thoughts!
>
>         Script:
>
>         gffdir <-  "/data/smcintur/Annotation/__TAIR10_GFF3_genes.gff";
>         library(ShortRead)
>         library(GenomicFeatures)
>         library(Rsamtools)
>         #library(biomaRt)
>         setwd("/data/smcintur/RNASeq/__NMseq/newTophat/BamFiles/")
>         bf <- list.files(pattern = ".bam$")
>         bi <- list.files(pattern = "bam.bai$")
>         bfl <- BamFileList(bf, bi)
>
>         #mart <- makeTranscriptDbFromBiomart(__biomart =
>         "Public_TAIRV10", dataset=
>         "tairv10")
>         tx <- transcriptsBy(mart)
>         gff <- makeTranscriptDbFromGFF(__gffdir, format = "gff")
>         gff
>
>         cds <- cdsBy(gff)
>         tx <- transcriptsBy(gff)
>
>         olapTx <- summarizeOverlaps(tx, bfl)
>         olapCds <- summarizeOverlaps(cds, bfl)
>
>         txMat <- assays(olapTx)$counts
>         cdsMat <- assays(olapCds)$counts
>
>         write.table(txMat, file = "countMatTxGff.txt", sep = "\t")
>         write.table(cdsMat, file = "countMatCdsGff.txt", sep = "\t")
>
>         sessionInfo
>         R version 3.0.0 (2013-04-03)
>         Platform: x86_64-unknown-linux-gnu (64-bit)
>
>         locale:
>            [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>            [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>            [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>            [7] LC_PAPER=C                 LC_NAME=C
>            [9] LC_ADDRESS=C               LC_TELEPHONE=C
>         [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
>         attached base packages:
>         [1] stats     graphics  grDevices utils     datasets  methods   base
>
>         other attached packages:
>         [1] BiocInstaller_1.12.0
>
>         loaded via a namespace (and not attached):
>            [1] AnnotationDbi_1.24.0   Biobase_2.22.0
>         BiocGenerics_0.8.0
>            [4] biomaRt_2.18.0         Biostrings_2.30.1      bitops_1.0-6
>            [7] BSgenome_1.30.0        DBI_0.2-7
>           GenomicFeatures_1.14.2
>         [10] GenomicRanges_1.14.3   IRanges_1.20.6         parallel_3.0.0
>         [13] RCurl_1.95-4.1         Rsamtools_1.14.2       RSQLite_0.11.4
>         [16] rtracklayer_1.22.0     stats4_3.0.0           tools_3.0.0
>         [19] XML_3.98-1.1           XVector_0.2.0          zlibbioc_1.8.0
>
>
>     _________________________________________________
>     Bioconductor mailing list
>     Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>     https://stat.ethz.ch/mailman/__listinfo/bioconductor
>     <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>     Search the archives:
>     http://news.gmane.org/gmane.__science.biology.informatics.__conductor <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
>
>
> --
> Reema Singh
> PhD Scholar
> Computational Biology and Bioinformatics
> School of Computational and Integrative Sciences
> Jawaharlal Nehru University
> New Delhi-110067
> INDIA



More information about the Bioconductor mailing list