[BioC] problem in reading bam files with EDASeq

Valerie Obenchain vobencha at fhcrc.org
Fri Jan 20 21:40:19 CET 2012


Hi Raffaele,

It looks like you've identified a bug in EDASeq and the author is cc'd.

In the meantime, if you're after an assessment of the read quality have 
you tried using the qa report in ShortRead?

fl <- system.file("extdata", "ex1.bam", package="Rsamtools")
qaSummary <- qa(fl, type="BAM")
report(qaSummary, dest="/path/to/report_directory")

This report can then be viewed in your browser.


Valerie


On 01/11/12 03:55, rcaloger wrote:
> Hi,
> I an trying to use EDASeq for Read-level data.
> My bam file was generated by bowtie mapping of reads against human 
> hg19 chr22, sam were then converted in bam using picard tool and 
> indexing was done with Rsamtools.
> If I try to using plotQuality:
> > bfs
> BamFileList of length 4
> > bfs[[1]]
> class: BamFile
> path: 
> H:/data/calogero/bioC-devel/oneChannelGUI.svn/testing.oneChannel/NGS/mRNA/libs.../breast1_R1.fastq-breast1_R2.fastq.hs_chr22.bam
> index: 
> H:/data/calogero/bioC-devel/oneChannelGUI.svn/testing.oneChannel/NGS/mRNA/lib.../breast1_R1.fastq-breast1_R2.fastq.hs_chr22.bam
> isOpen: FALSE
> > plotQuality(bfs[[1]])
> Errore in .IRanges.checkAndTranslateSingleBracketSubscript(x, i) :
>   subscript contains NAs
> >
>
> I investigated the problem and I found out that the problems comes out
> at the line of setMethod plotQuality
> quality <- as(fq[strand=="+"], "matrix")
> It seems that the problem is related to the presence of three types of 
> codes in mapped strands: +, - and <NA>
> unique(strand)
> [1] +    - <NA>
> Levels: + - *
> > length(strand)
> [1] 20044478
> > length(strand[which(strand=="-")])
> [1] 143352
> > length(strand[which(strand=="+")])
> [1] 143352
> > length(strand[is.na(strand)])
> [1] 19757774
> >
> The 19757774 sequences that provide the error are those unmapped where 
> it is not possible to assign a strand
>
> #If I remove unmapped quality scores:
> > fq <- fq[!is.na(strand)]
> #and if I remove <NA> from stran
> > strand <- strand[!is.na(strand)]
> #The problem is solved
> > quality <- as(fq[strand=="+"], "matrix")
> >str(quality)
>  int [1:143352, 1:50] 70 70 70 68 70 70 51 70 70 70
>
> So I tried:
> plotQuality(bfs[[1]], flag=scanBamFlag(isUnmappedQuery=FALSE))
> but I still get:
>  Error in .IRanges.checkAndTranslateSingleBracketSubscript(x, i) :
>   subscript contains NAs
>
>
> Any suggestion?
>
> Many thanks
> Raffaele
>



More information about the Bioconductor mailing list