[BioC] Problems Counting Reads with summarizeOverlaps
vobencha at fhcrc.org
Sun Dec 8 15:55:53 CET 2013
Are you getting any error messages related to seqlevels when you count?
If you have confirmed that annotation seqlevels match the bam files next
look at the maximum possible overlap in a single bam file.
bf <- singleBamFile
reads <- readGAlignments(bf) ## assuming single-end reads
so <- summarizeOverlaps(tx, reads, mode=Union, inter.feature=FALSE)
'inter.feature=FALSE' with 'mode=Union' is countOverlaps with
'type=ANY'. Reads that hit multiple features will be double counted in
this case. The idea to to see if there are any common regions at all.
If there are still no counts, you could look more closely at a smaller
chromosome region in 'reads' where you would expect counts. This visual
inspection might give you some insight as to why there is no overlap.
On 12/05/13 18:51, Sam McInturf wrote:
> Hello Bioconductors,
> I am working on an Arabidopsis RNA seq set, and after executing my count
> reads script, I get a count table with no reads counted (ie
> sum(colSums(mat)) = 0).
> I mapped my reads using Tophat2 (without supplying a gtf/gff file) and used
> samtools to sort and index my accepted_hits.bam files. I loaded these bam
> files into IGV (integrated Genome Browser) and the reads appear to have
> been mapped (additionally the align_summary.txt says I have good mapping).
> Script and session info below.
> Essentially I use Rsamtools to load in my bamfiles, and then
> makeTranscriptsFromGFF to make a txdb object, transcriptsBy to get
> transcripts, and then summarizeOverlaps to count the reads. My gff file is
> TAIR10. I am relatively sure that the genome I mapped to was the TAIR10
> release, but I am not 100% on that.
> I have also used biomaRt to do this (commented out), but I had the same
> I am currently upgrading to R 3.0.2 and re-installing bioconductor (maybe
> it will change something?)
> Thanks for any thoughts!
> gffdir <- "/data/smcintur/Annotation/TAIR10_GFF3_genes.gff";
> bf <- list.files(pattern = ".bam$")
> bi <- list.files(pattern = "bam.bai$")
> bfl <- BamFileList(bf, bi)
> #mart <- makeTranscriptDbFromBiomart(biomart = "Public_TAIRV10", dataset=
> tx <- transcriptsBy(mart)
> gff <- makeTranscriptDbFromGFF(gffdir, format = "gff")
> cds <- cdsBy(gff)
> tx <- transcriptsBy(gff)
> olapTx <- summarizeOverlaps(tx, bfl)
> olapCds <- summarizeOverlaps(cds, bfl)
> txMat <- assays(olapTx)$counts
> cdsMat <- assays(olapCds)$counts
> write.table(txMat, file = "countMatTxGff.txt", sep = "\t")
> write.table(cdsMat, file = "countMatCdsGff.txt", sep = "\t")
> R version 3.0.0 (2013-04-03)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>  LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>  LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>  LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>  LC_PAPER=C LC_NAME=C
>  LC_ADDRESS=C LC_TELEPHONE=C
>  LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> attached base packages:
>  stats graphics grDevices utils datasets methods base
> other attached packages:
>  BiocInstaller_1.12.0
> loaded via a namespace (and not attached):
>  AnnotationDbi_1.24.0 Biobase_2.22.0 BiocGenerics_0.8.0
>  biomaRt_2.18.0 Biostrings_2.30.1 bitops_1.0-6
>  BSgenome_1.30.0 DBI_0.2-7 GenomicFeatures_1.14.2
>  GenomicRanges_1.14.3 IRanges_1.20.6 parallel_3.0.0
>  RCurl_1.95-4.1 Rsamtools_1.14.2 RSQLite_0.11.4
>  rtracklayer_1.22.0 stats4_3.0.0 tools_3.0.0
>  XML_3.98-1.1 XVector_0.2.0 zlibbioc_1.8.0
More information about the Bioconductor