[BioC] summarizeOverlaps colData does NOT contain countBam() summary data

Martin Morgan mtmorgan at fhcrc.org
Sat Feb 1 16:39:31 CET 2014


Thanks Malcom.  The documentation at this point is not accurate; there's a parameter count.mapped.reads=TRUE that needs to be set; it _is_ documented on ?BamFile and has been clarified in devel (where summarizeOverlaps is in the new package 'GenomicAlignments'). Martin

----- Malcolm Cook <MEC at stowers.org> wrote:
> Valerie and other Genomics,
> 
> I read in ?summarizeOverlaps that 
> 
>      'colData' is a DataFrame with columns of 'object' (class of
>      'reads') and 'records' (length of 'reads'). When 'reads' is a
>      BamFile or BamFileList the 'colData' holds the output of a call to
>      'countBam' with columns of 'records' (total records in file),
>      'nucleotides' and 'mapped'. The number in 'mapped' is the number
>      of records returned when 'isUnmappedQuery=FALSE' in the
>      'ScanBamParam'.
> 
> and also,
> 
>      ## When the reads are Bam files, the 'colData' contains summary 
>      ## information from a call to countBam().
> 
> However, I find this NOT to be true.  Viz (in a fresh R session)
> 
> >library(GenomicRanges)
> >example(summarizeOverlaps)
> ....
> > colData(se)
> DataFrame with 2 rows and 0 columns
> 
> # but yet:
> 
> > do.call(rbind,lapply(fls,countBam))
>                   space start end width              file records nucleotides
> sm_treated1.bam      NA    NA  NA    NA   sm_treated1.bam    1800       80260
> sm_untreated1.bam    NA    NA  NA    NA sm_untreated1.bam    1800      135000
> 
> Can you advise?
> 
> Thanks!
> 
> ~ Malcolm Cook 
> Computational Biology / Shilatifard Lab - Stowers Institute for Medical Research - Kansas City
> 
> 
> PS
> 
> > sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-unknown-linux-gnu (64-bit)
> 
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
> 
> attached base packages:
> [1] parallel  stats     graphics  grDevices datasets  utils     methods   base     
> 
> other attached packages:
>  [1] edgeR_3.4.2                                limma_3.18.9                               DESeq_1.14.0                               lattice_0.20-24                            locfit_1.5-9.1                             TxDb.Dmelanogaster.UCSC.dm3.ensGene_2.10.1 GenomicFeatures_1.14.2                     AnnotationDbi_1.24.0                       Biobase_2.22.0                             pasillaBamSubset_0.0.8                     BiocInstaller_1.12.0                       Rsamtools_1.14.2                           Biostrings_2.30.1                         
> [14] GenomicRanges_1.14.4                       XVector_0.2.0                              IRanges_1.20.6                             BiocGenerics_0.8.0                        
> 
> loaded via a namespace (and not attached):
>  [1] annotate_1.40.0    biomaRt_2.18.0     bitops_1.0-6       BSgenome_1.30.0    DBI_0.2-7          genefilter_1.44.0  geneplotter_1.40.0 grid_3.0.2         RColorBrewer_1.0-5 RCurl_1.95-4.1     RSQLite_0.11.4     rtracklayer_1.22.2 splines_3.0.2      stats4_3.0.2       survival_2.37-7    tools_3.0.2        XML_3.98-1.1       xtable_1.7-1       zlibbioc_1.8.0    
> >
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list