[BioC] report a possble bug in shortread

Martin Morgan mtmorgan at fhcrc.org
Fri Nov 30 05:32:37 CET 2012


Hi --  If I make a plain text file 'test.fq' with just these four lines in it

 > id
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
@ id
HHHHHHHHHHHHHHHHHHGHHHHHHDDCDD#?CCBCDDDCHEHHHHHHHHHHHFH7


and then read it in I get FastqQuality

 > data.class(quality(readFastq("/tmp/tmp.fq")))
[1] "FastqQuality"

Do you? The algorithm uses a heuristic and looks at the first 1000 reads

                 alf <- alphabetFrequency(head(quality, 1000),
                   collapse = TRUE)
                 if (any(alf) && min(which(alf != 0)) < 59) {
                   FastqQuality
                 } else SFastqQuality

so if you have high-quality reads initially it will guess wrong; I'll increase 
the number of reads looked at.

Martin

On 11/29/2012 07:48 PM, Wang Peter wrote:
> i read the fastq file and let the function identify the quality score
>
> 		if(iteration==1)#
> 		{
> 			score_sys = data.class(quality(reads));	
> 		}
> 		if(score_sys =="SFastqQuality")#Phred+64
> 		{
> 		  do something
> 		}
> 		if(score_sys =="FastqQuality")#Phred+33
> 		{
> 		 do something
> 		}
>
>
> but the score_sys=SFastqQuality, the data is Phred+33
>
> see the quality line
> HHHHHHHHHHHHHHHHHHGHHHHHHDDCDD#?CCBCDDDCHEHHHHHHHHHHHFH7
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list