[BioC] a possible bug in the shortread packge

Martin Morgan mtmorgan at fhcrc.org
Wed May 14 21:56:05 CEST 2014


On 05/14/2014 11:17 AM, Wang Peter wrote:
> the coding can works well on many data.
> but when it works on 12 lines, i met such a problem
>
> how can the function tell the score if 33 or 64 system?
>
> library(ShortRead);
> reads <- readFastq(fastqfile);
> seqs <- sread(reads);
> score_sys = data.class(quality(reads));
> cat("the quality score system
> (SFastqQuality=Phred+64,FastqQuality=Phred+33) is",score_sys,"\n")
>
>
> the output is:
> the quality score system (SFastqQuality=Phred+64,FastqQuality=Phred+33) is
> SFastqQuality
> but it is really the FastqQuality=Phred+33
>
> @HISEQ04:126:C343UACXX:8:1103:15851:74641 1:N:0:ACAGTG
> GGCCTCTCAATGTCAAGGGATCGACGGCAGATATCATAGATGGCCTCATTGTCCAAGAGAACTGCGACATCTGTGTGCTCGAGCAAGGAATGAGTGGAAAG
> +
> BBBFFFFFFFFFFFFIIIIIIIIIIIIIIIFIIIIIIIIIIIIIIIIIIIIIIIIIIFFFFFFFFFBFFFFBFFFFFFFFFFFFFFFFFFFFFFBFFFBFB
> @HISEQ04:126:C343UACXX:8:1103:16187:74529 1:N:0:ACAGTG
> CAATTCTAGCTACTGGAGCTGTCCATTTGCCGCGCAGGCACTGAAGATAGAACATCGATCGAGTCAACCTCTACCTGCATTAGGTGACTGCTGAGAGCTCC
> +
> BBBFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFFFFFFFBFFFFFFFFFFFFFBFFFFFFBFFFFFFFFFFFFFFFF
> @HISEQ04:126:C343UACXX:8:1103:16244:74553 1:N:0:ACAGTG
> GCCGAAGCATTTTTGGCTTCTGTAAGGTTGTACATATGAAGCAGATTGCTCCAGCTTGGAAGAGTCATGTTTGTGACGAGAGAACTGGCTACAGCTCCAGG
> +
> BBBFFFFFFFFFFIIIIIIIIIIIIIIFFIFIIIIIIFIIIIIIIIIIIIIIIIIIIIIIIIIIFFFIIFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFF
>
>

 From the help page

   ?readFastq

the 'qualityType' argument is described as

           qualityType: Representation to be used for quality scores,
               must be one of 'Auto' (choose Phred-like if any character
               is ASCII-encoded as less than 59) 'FastqQuality'
               (Phred-like encoding), 'SFastqQuality' (Illumina
               encoding).

'Auto' is the default, none of the ASCII-encoded quality characters is less than 
59, hence choose SFastqQuality.

Invoke the command with the information about encoding if known,

   readFastq(fastqfile, qualityType="FastqQuality")

See this previous post

   https://stat.ethz.ch/pipermail/bioconductor/2012-September/048172.html

-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list