[BioC] summarizeOverlaps with mapq filter?

Valerie Obenchain vobencha at fhcrc.org
Fri Jul 5 22:22:35 CEST 2013


Hi SangChul,

Unfortuantely ScanBamParam() does not support filtering on specific 
values of the sam fields.

You can subset on the TRUE/FALSE flags, such as the QC flag,

     ScanBamParam(flag=scanBamFlag(isNotPassingQualityControls=FALSE))

but it sounds like you really want to filter on the map quality. You 
could write a little a helper function to filter and count the bams.


     filterAndCountBam <- function(bf, features, mode, mapq_cutoff)
     {
         gal <- readGAlignments(bf, param=ScanBamParam(what="mapq"))
         reads <- gal[mcols(gal)$mapq > mapq_cutoff]
         summarizeOverlaps(features, reads, mode)
     }

lapply'ing over the list of bam files returns a list of 
SummarizedExperiments.

     gr <- GRanges(....)
     lst <- lapply(bamFileList, filterAndCountBam, features=gr,
                   mode=Union, mapq_cutoff=30)

Then cbind the SummarizedExperiments together to combine counts.

     do.call(cbind, lst)


Valerie




On 07/03/2013 06:24 PM, Sang Chul Choi wrote:
> Hi,
>
> I wonder if I could use "summarizeOverlaps" to count reads with mapping quality (or MAPQ) larger than some value, say, 30.  I took the example script in the function's help to specify what function signature I want to use.   I used to use "readBamGappedAlignments" to read a BAM file, and filter out reads with MAPQ less than a specified value, and use the filtered GappedAlignments object to count reads mapped on genomic features.  But, I like the feature of summarizeOverlaps that I can use to read a list of BAM files.  But, I do not know use the summarizeOverlaps function to use only reads with MAPQ larger than some specified value.
>
> If this is not possible currently, then I might have to create another BAM file of reads with MAPQ larger than some value and use the BAM file to count overlaps using summarizeOverlaps function.  I do not like this two-step procedure.
>
> Thank you for your help!
>
> SangChul
>
> ---------------------------------
>     library(Rsamtools)
>
>       fls <- list.files(system.file("extdata",package="GenomicRanges"),
>                         recursive=TRUE, pattern="*bam$", full=TRUE)
>       names(fls) <- basename(fls)
>       bf <- BamFileList(fls, index=character())
>       features <- GRanges(
>           seqnames = c(rep("chr2L", 4), rep("chr2R", 5), rep("chr3L", 2)),
>           ranges = IRanges(c(1000, 3000, 4000, 7000, 2000, 3000, 3600,
>                              4000, 7500, 5000, 5400),
>                            width=c(rep(500, 3), 600, 900, 500, 300, 900,
>                                    300, 500, 500)), "-",
>           group_id=c(rep("A", 4), rep("B", 5), rep("C", 2)))
>
>       se <- summarizeOverlaps(features, bf)
> ---------------------------------
>
> ===============================
> Sang Chul Choi Ph.D.
> Research Associate
>
> Sang Chul Choi (222 WRRB IAB)
> 902 Koyukuk Dr (PO Box 757000)
> 311 Irving I, University of Alaska Fairbanks
> Fairbanks, AK 99775-7000
>
> Office +1-907-474-5071
> Fax +1-907-474-6967
> Google Phone +1-607-542-9362
> ===============================
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list