[BioC] ShortRead - readAligned() with bowtie & qual

Marc Noguera mnoguera at imppc.org
Wed Jan 20 12:36:33 CET 2010


Dear list,
I am trying to do some quality assessment on solexa runs using
Bioc&shortreads.
I am using bowtie as a mapper, which yields bowtie-formatted output with
fastq scores for alignment, such as:
> HWUSI-EAS621_91022_1_100_1938_1667 +   chr15   53573544   
> CAGTCTCCCAAAGTACTGGGATAATAGGTGTGAGACTCC
> DPYWYWYYWWWWPWWYWTVWWYWWWYYWYWXWBBBBBBB 0   34:C>A,36:A>T
>  HWUSI-EAS621_91022_1_100_1938_1823 -   chr18   34747447   
> ACCCGGGAGTTGGGCTGCTTAGTGGCTGGACTCTCTTCC
> BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0   34:T>G
>  HWUSI-EAS621_91022_1_100_1938_608  +   chr19   35665132   
> CAGCTGCTCAGGAGGCTGAGGCAGGAGAATCGCTTGAGC
> DMTTTSRSTUTTTTTUTTTTTTTTTTQSSBBBBBBBBBB 2
>  HWUSI-EAS621_91022_1_100_1938_1207 +   chr22   30069585   
> TCTGGGCCGTGGGGAGGCTCCTCCTTGGCTGATGGCGCC
> DMTUTTRUTPTSTTUUUTSSTTUTBBBBBBBBBBBBBBB 0   35:T>C,37:A>C
>  HWUSI-EAS621_91022_1_100_1938_222  -   chr20   61020239   
> GCCTGGGCCTCCCGAAGTGCTGTGGTTACAGGCATGAGC
> BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 2   25:A>G,34:C>G
>  HWUSI-EAS621_91022_1_100_1938_1562 +   chr15   84916971   
> TGGGTTTCACCATGGTGGCCAGGCTGGTCTCAAACTCCT
> DNUVUWWWWWWWWWWWWWUWWWWWWWVBBBBBBBBBBBB 0
>  HWUSI-EAS621_91022_1_100_1938_1290 -   chr9    120742911  
> AGCCCAAGAGAGCCTTCTCCTCGACCATTACCACCAATG
> BBBBBBBBBBBBBBBBSWRLPWUWRSWUKTWXXWXWWND 0   33:C>A,35:T>C
When I try to read this file with the readAligned() function with:
>  aln <-
readAligned("/path/",pattern="test.fastq.bwt",type="Bowtie",qualityType='FastqQuality')

I obtain an alignedread object, which includes quality data.
> quality(aln)
> > quality(aln)
> class: SFastqQuality
> quality:
>   A BStringSet instance of length 3331015
>           width seq
>       [1]    35 BBB=B?:AA:@?@>?B@@AA@@A;>@4>>7922=>
>       ...   ... ...
> [3331015]    33 %%/<<<1;:<<:<<<<995<<<:<::<<<:<<<
However, when I try to use this qualities to plot them I obtain "NA" values
> > alignQuality(aln)
> class: NumericQuality
> quality: NA NA ... NA NA (3331015 total)
So, I guess there is some kind of problem when transforming to ASCII to
quality numerical values. I have also tried with SFastqQuality type to
read the input, with no succes.

What am I doing wrong?

thanks in advance
Marc

-- 

-----------------------------------------------------
Marc Noguera i Julian, PhD
Genomics unit / Bioinformatics
Institut de Medicina Preventiva i Personalitzada
del Càncer (IMPPC)
B-10 Office
Carretera de Can Ruti
Camí de les Escoles s/n
08916 Badalona, Barcelona



More information about the Bioconductor mailing list