[BioC] problem in reading bam files with EDASeq

rcaloger raffaele.calogero at gmail.com
Wed Jan 11 12:55:11 CET 2012


Hi,
I an trying to use EDASeq for Read-level data.
My bam file was generated by bowtie mapping of reads against human hg19 
chr22, sam were then converted in bam using picard tool and indexing was 
done with Rsamtools.
If I try to using plotQuality:
 > bfs
BamFileList of length 4
 > bfs[[1]]
class: BamFile
path: 
H:/data/calogero/bioC-devel/oneChannelGUI.svn/testing.oneChannel/NGS/mRNA/libs.../breast1_R1.fastq-breast1_R2.fastq.hs_chr22.bam
index: 
H:/data/calogero/bioC-devel/oneChannelGUI.svn/testing.oneChannel/NGS/mRNA/lib.../breast1_R1.fastq-breast1_R2.fastq.hs_chr22.bam
isOpen: FALSE
 > plotQuality(bfs[[1]])
Errore in .IRanges.checkAndTranslateSingleBracketSubscript(x, i) :
   subscript contains NAs
 >

I investigated the problem and I found out that the problems comes out
at the line of setMethod plotQuality
quality <- as(fq[strand=="+"], "matrix")
It seems that the problem is related to the presence of three types of 
codes in mapped strands: +, - and <NA>
unique(strand)
[1] +    - <NA>
Levels: + - *
 > length(strand)
[1] 20044478
 > length(strand[which(strand=="-")])
[1] 143352
 > length(strand[which(strand=="+")])
[1] 143352
 > length(strand[is.na(strand)])
[1] 19757774
 >
The 19757774 sequences that provide the error are those unmapped where 
it is not possible to assign a strand

#If I remove unmapped quality scores:
 > fq <- fq[!is.na(strand)]
#and if I remove <NA> from stran
 > strand <- strand[!is.na(strand)]
#The problem is solved
 > quality <- as(fq[strand=="+"], "matrix")
 >str(quality)
  int [1:143352, 1:50] 70 70 70 68 70 70 51 70 70 70

So I tried:
plotQuality(bfs[[1]], flag=scanBamFlag(isUnmappedQuery=FALSE))
but I still get:
  Error in .IRanges.checkAndTranslateSingleBracketSubscript(x, i) :
   subscript contains NAs


Any suggestion?

Many thanks
Raffaele

-- 

----------------------------------------
Prof. Raffaele A. Calogero
Bioinformatics and Genomics Unit
MBC Centro di Biotecnologie Molecolari
Via Nizza 52, Torino 10126
tel.   ++39 0116706457
Fax    ++39 0112366457
Mobile ++39 3333827080
email: raffaele.calogero at unito.it
        raffaele[dot]calogero[at]gmail[dot]com
www:   http://www.bioinformatica.unito.it



More information about the Bioconductor mailing list