[BioC] writeFastq writing dashes instead of dots

Martin Morgan mtmorgan at fhcrc.org
Tue Feb 26 19:38:35 CET 2013


Hi Thomas --

On 02/25/2013 09:08 AM, Thomas Rensch wrote:
> Hi everyone,
>
> I am reading and writing fastq files and writeFastq just swaps dots ('.') for dashes ('-').
>
> Is this the desired behaviour of writeFastq and if so why? Otherwise could someone better at R development than I modify this?
>
> Example fastq:
>
>
> @HWI-EAS149_3:1:1:0:1956:0:1:0
> .A..................................
> +
> BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
> @HWI-EAS149_3:1:1:0:173:0:1:0
> .T..................................
> +
> BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
> @HWI-EAS149_3:1:1:0:47:0:1:0
> .T..................................
> +
> BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

It's actually on input

 > sread(readFastq("tmp.fastq"))
   A DNAStringSet instance of length 3
     width seq
[1]    36 -A----------------------------------
[2]    36 -T----------------------------------
[3]    36 -T----------------------------------

because '.' is not an a valid letter for DNAStringSet (or from the international 
standard, if I understand correctly...)

 > DNAStringSet(".")
Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW),  :
   key 46 (char '.') not in lookup table
 > alphabet(DNAStringSet())
  [1] "A" "C" "G" "T" "M" "R" "W" "S" "Y" "K" "V" "H" "D" "B" "N" "-" "+"
 > DNA_ALPHABET
  [1] "A" "C" "G" "T" "M" "R" "W" "S" "Y" "K" "V" "H" "D" "B" "N" "-" "+"

and from ?DNA_ALPHABET

      This alphabet contains all letters from the IUPAC Extended Genetic
      Alphabet (see '?IUPAC_CODE_MAP') + the gap ('"-"') and the hard
      masking ('"+"') letters.

One possibility would be to add an option writeFastq(..., dashesASdots=FALSE), 
is that really a good idea?

Martin

>
>
> Thanks a lot,
> Thomas
>
> --
> Thomas Rensch
> PhD Student - Paul Flicek Group
> EMBL-EBI
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list