[BioC] a possible bug in the shortread packge

Martin Morgan mtmorgan at fhcrc.org
Wed May 14 22:03:47 CEST 2014


On 05/14/2014 01:01 PM, Wang Peter wrote:
> thank you very much
> i think this method is not reliable
> if the data is high quality, no nt is low , like 59.
> they will be thought as SFastqQuality.
>
> i would u like to see if some score is higher than 80, then choose SFastqQuality.

why 80? maybe better to force explicit choice. Is there a better standard than

http://en.wikipedia.org/wiki/FASTQ_format


>
>
> On Thu, May 15, 2014 at 3:56 AM, Martin Morgan <mtmorgan at fhcrc.org
> <mailto:mtmorgan at fhcrc.org>> wrote:
>
>     On 05/14/2014 11:17 AM, Wang Peter wrote:
>
>         the coding can works well on many data.
>         but when it works on 12 lines, i met such a problem
>
>         how can the function tell the score if 33 or 64 system?
>
>         library(ShortRead);
>         reads <- readFastq(fastqfile);
>         seqs <- sread(reads);
>         score_sys = data.class(quality(reads));
>         cat("the quality score system
>         (SFastqQuality=Phred+64,__FastqQuality=Phred+33) is",score_sys,"\n")
>
>
>         the output is:
>         the quality score system (SFastqQuality=Phred+64,__FastqQuality=Phred+33) is
>         SFastqQuality
>         but it is really the FastqQuality=Phred+33
>
>         @HISEQ04:126:C343UACXX:8:1103:__15851:74641 1:N:0:ACAGTG
>         GGCCTCTCAATGTCAAGGGATCGACGGCAG__ATATCATAGATGGCCTCATTGTCCAAGAGA__ACTGCGACATCTGTGTGCTCGAGCAAGGAA__TGAGTGGAAAG
>         +
>         BBBFFFFFFFFFFFFIIIIIIIIIIIIIII__FIIIIIIIIIIIIIIIIIIIIIIIIIIFFF__FFFFFFBFFFFBFFFFFFFFFFFFFFFFFF__FFFFBFFFBFB
>         @HISEQ04:126:C343UACXX:8:1103:__16187:74529 1:N:0:ACAGTG
>         CAATTCTAGCTACTGGAGCTGTCCATTTGC__CGCGCAGGCACTGAAGATAGAACATCGATC__GAGTCAACCTCTACCTGCATTAGGTGACTG__CTGAGAGCTCC
>         +
>         BBBFFFFFFFFFFIIIIIIIIIIIIIIIII__IIIIIIIIIIIIIIIIIIIIIIIIIIFFFF__FFFBFFFFFFFFFFFFFBFFFFFFBFFFFF__FFFFFFFFFFF
>         @HISEQ04:126:C343UACXX:8:1103:__16244:74553 1:N:0:ACAGTG
>         GCCGAAGCATTTTTGGCTTCTGTAAGGTTG__TACATATGAAGCAGATTGCTCCAGCTTGGA__AGAGTCATGTTTGTGACGAGAGAACTGGCT__ACAGCTCCAGG
>         +
>         BBBFFFFFFFFFFIIIIIIIIIIIIIIFFI__FIIIIIIFIIIIIIIIIIIIIIIIIIIIII__IIIIFFFIIFFFFFFFFFFFFFFFBFFFFF__FFFFFFFFFFF
>
>
>
>      From the help page
>
>        ?readFastq
>
>     the 'qualityType' argument is described as
>
>                qualityType: Representation to be used for quality scores,
>                    must be one of 'Auto' (choose Phred-like if any character
>                    is ASCII-encoded as less than 59) 'FastqQuality'
>                    (Phred-like encoding), 'SFastqQuality' (Illumina
>                    encoding).
>
>     'Auto' is the default, none of the ASCII-encoded quality characters is less
>     than 59, hence choose SFastqQuality.
>
>     Invoke the command with the information about encoding if known,
>
>        readFastq(fastqfile, qualityType="FastqQuality")
>
>     See this previous post
>
>     https://stat.ethz.ch/__pipermail/bioconductor/2012-__September/048172.html
>     <https://stat.ethz.ch/pipermail/bioconductor/2012-September/048172.html>
>
>     --
>     Computational Biology / Fred Hutchinson Cancer Research Center
>     1100 Fairview Ave. N.
>     PO Box 19024 Seattle, WA 98109
>
>     Location: Arnold Building M1 B861
>     Phone: (206) 667-2793 <tel:%28206%29%20667-2793>
>
>
>
>
> --
> shan gao
> Room 231(Dr.Fei lab)
> Boyce Thompson Institute for Plant Research
> Cornell University
> Tower Road, Ithaca, NY 14853-1801
> Office phone: 1-607-254-1267(day)
> Official email:sg839 at cornell.edu <mailto:email%3Asg839 at cornell.edu>
> Facebook:http://www.facebook.com/profile.php?id=100001986532253


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list