[BioC] a program to force fastq using pred+33 scoring system

Martin Morgan mtmorgan at fhcrc.org
Tue May 27 02:08:53 CEST 2014


On 05/26/2014 04:57 PM, Wang Peter wrote:
> dear Martin:
>   thank you for your help on the some codiing.
> but i have a question about the function
>
> as(quality(reads), "numeric")
>
> the function quality must automatically identify the score system, phred+33 and
> phred+64.
>
> that is good on some cases, but if i processed the data in chunk.
> maybe a chunk of data have the scores above 59, but they are phred+33
> i am afraid it will lead a wronge identification

The right thing to do is to create the appropriate quality object when the data 
is read in (using the qaulityType argument to readFastq) or to use the correct 
quality type when using ShortReadQ(). If neither of these is possible (I don't 
know why they would not be?) then you could

   as(FastqQuality(quality(quality(reads))), "numeric")

Also, I changed the 'Auto' rules for readFastq so that phred+64 is only chosen 
when all characters are encoded greater than 58 (':') and some are greater than 
74 ('J'). This is in the 'devel' version, 1.23.11.

>
> can i set a parameter to tell the function as(quality(reads), "numeric") ,
> the score sys is
> phred+33 or 64
>
> thank you
> shan
>
>
>
> On Tue, May 20, 2014 at 1:23 AM, Martin Morgan <mtmorgan at fhcrc.org
> <mailto:mtmorgan at fhcrc.org>> wrote:
>
>     On 05/19/2014 06:49 AM, Wang Peter wrote:
>
>         i am sorry, i have another question about the coding
>             phred64 = rawToChar(as.raw(64:104))
>             phred33 = rawToChar(as.raw(33:73))
>         qual = chartr(phred64, phred33, quality(quality(reads)))
>         how did it deal with quality score from 59 to 63?
>
>
>     I did not include them; expand the phred64 and phred33 strings to reflect
>     the desired mapping.
>
>     Martin
>
>
>
>         On Mon, May 19, 2014 at 9:32 PM, Wang Peter <wng.peter at gmail.com
>         <mailto:wng.peter at gmail.com>
>         <mailto:wng.peter at gmail.com <mailto:wng.peter at gmail.com>>> wrote:
>
>              thank you very much
>              you coding is so beautiful, mine is ugly
>
>
>              On Mon, May 19, 2014 at 8:03 PM, Martin Morgan <mtmorgan at fhcrc.org
>         <mailto:mtmorgan at fhcrc.org>
>              <mailto:mtmorgan at fhcrc.org <mailto:mtmorgan at fhcrc.org>>> wrote:
>
>                  On 05/14/2014 02:56 PM, Wang Peter wrote:
>
>                      i worte a program to automatically dectect the scoring
>         system in
>                      fastq file
>                      and if pred+64, it should be changed into pred+33
>
>
>                  I would make the relevant alphabets
>
>                     phred64 = rawToChar(as.raw(64:104))
>                     phred33 = rawToChar(as.raw(33:73))
>
>                  translate the character representation of the underling BStringSet
>
>                     qual = chartr(phred64, phred33, quality(quality(reads)))
>
>                  and tag the result as phred+33
>
>                     ShortReadQ(sread(reads), FastqQuality(qual))
>
>                  To get a numeric representation of the quality encoding, I would
>
>                     as(quality(reads), "numeric")
>
>                  or for a rectangular, NA-padded representation
>
>                     as(quality(reads), "matrix")
>
>                  Martin
>
>
>                      please help me make sure it is right
>
>                      rm(list=ls())
>                      fastqfile="1.fastq";
>                      working_dir="d:/working_dir"
>                      setwd(working_dir)
>                      library(ShortRead);
>                      reads <- readFastq(fastqfile);
>                      seqs <- sread(reads);
>                      score_sys = data.class(quality(reads));
>                      cat("the quality score system
>                      (SFastqQuality=Phred+64,____FastqQuality=Phred+33)
>         is",score_sys,"\n")
>
>                      raw_len <- max(width(reads));
>
>                      qual <- quality(quality(reads));
>                      myqual_16L <- charToRaw(as.character(unlist(____qual)));
>
>                      if(score_sys =="FastqQuality")
>                      {
>                         myqual_10L <- strtoi(myqual_16L,16L)-33;   # this is for
>         other use
>                      }
>
>                      if(score_sys =="SFastqQuality")
>                      {
>                         myqual_10L <- strtoi(myqual_16L,16L)-64;
>                         qual_temp <- PhredQuality (as.integer(myqual_10L));
>                         qual <- BStringSet(unlist(qual_temp), start= seq(from =
>         1, to =
>                      raw_len*(length(reads)-1)+1, by = raw_len), width=raw_len);
>
>                      }
>                      reads <- ShortReadQ(sread=seqs, quality=qual ) );
>
>
>
>                  --
>                  Computational Biology / Fred Hutchinson Cancer Research Center
>                  1100 Fairview Ave. N.
>                  PO Box 19024 Seattle, WA 98109
>
>                  Location: Arnold Building M1 B861
>                  Phone: (206) 667-2793 <tel:%28206%29%20667-2793>
>         <tel:%28206%29%20667-2793>
>
>
>
>
>
>              --
>              shan gao
>              Room 231(Dr.Fei lab)
>              Boyce Thompson Institute for Plant Research
>              Cornell University
>              Tower Road, Ithaca, NY 14853-1801
>              Office phone: 1-607-254-1267 <tel:1-607-254-1267>
>         <tel:1-607-254-1267 <tel:1-607-254-1267>>(day)
>              Official email:sg839 at cornell.edu <mailto:email%3Asg839 at cornell.edu>
>         <mailto:email%3Asg839 at cornell.__edu <mailto:email%253Asg839 at cornell.edu>>
>
>              Facebook:http://www.facebook.__com/profile.php?id=__100001986532253
>         <http://www.facebook.com/profile.php?id=100001986532253>
>
>
>
>
>         --
>         shan gao
>         Room 231(Dr.Fei lab)
>         Boyce Thompson Institute for Plant Research
>         Cornell University
>         Tower Road, Ithaca, NY 14853-1801
>         Office phone: 1-607-254-1267 <tel:1-607-254-1267>(day)
>         Official email:sg839 at cornell.edu <mailto:email%3Asg839 at cornell.edu>
>         <mailto:email%3Asg839 at cornell.__edu <mailto:email%253Asg839 at cornell.edu>>
>         Facebook:http://www.facebook.__com/profile.php?id=__100001986532253
>         <http://www.facebook.com/profile.php?id=100001986532253>
>
>
>
>     --
>     Computational Biology / Fred Hutchinson Cancer Research Center
>     1100 Fairview Ave. N.
>     PO Box 19024 Seattle, WA 98109
>
>     Location: Arnold Building M1 B861
>     Phone: (206) 667-2793 <tel:%28206%29%20667-2793>
>
>
>
>
> --
> shan gao
> Room 231(Dr.Fei lab)
> Boyce Thompson Institute for Plant Research
> Cornell University
> Tower Road, Ithaca, NY 14853-1801
> Office phone: 1-607-254-1267(day)
> Official email:sg839 at cornell.edu <mailto:email%3Asg839 at cornell.edu>
> Facebook:http://www.facebook.com/profile.php?id=100001986532253


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list