[BioC] shortread quality

Martin Morgan mtmorgan at fhcrc.org
Wed Jun 6 22:34:48 CEST 2012


On 06/06/2012 12:11 PM, Martin Morgan wrote:
> Hi,
>
> On 06/06/2012 08:00 AM, David martin wrote:
>> Hi,
>> I'm reading a fastq file from the solexa sequencer.
>> I would like to know how many reads have a phred score (>Q29). The thing
>
> If you mean the average base quality score, then
>
> fq <- readFastq(sp, fqpattern)
> score <- alphabetScore(fq)
>
> gives the sum of the base quality scores for each read, so is a vector
> as long as the length of the reads. The average is
>
> aveScore <- score / width(fq)
>
> and then you're in the realm of familiar R again, e.g.,
>
> hist(aveScore)
> table(aveScore > 29)
>
> etc.
>
> Hope that heps,

I guess the qa object already gets you further, as you've indicated

   df <- qaSummary[["readQualityScore"]]

the 'density' column (apparently not really a density) could be turned 
into a cumulative density

   cdensity <- cumsum(df$density) / sum(df$density)

and then look up the cumulative density nearest the quality that you're 
interested in

   cdensity[findInterval(29, df$quality)]

You'd want to do these steps separately for each lane, if there were 
several in df.

Martin



>
> Martin
>
>
>
>> is that i get the densities so i don't really know how many reads from
>> the total pass that filter. It's probaly easy for you so any hint would
>> be helpful
>>
>> library("ShortRead")
>> fqpattern <- "1102sdd_SN148_A_s_3_seq_GJH-85.txt"
>>
>> path = getwd()
>> sp <- SolexaPath(path,dataPath=path,analysisPath=path)
>>
>> # Read fastq File and save report
>> fq <- readFastq(sp, fqpattern)
>> qaSummary <- qa(fq,fqpattern)
>> save(qaSummary, file=file.path("./", paste(fqpattern,".rda",sep="" )))
>> report(qaSummary,dest="report")
>>
>> #Quality
>>
>> idx = which(qaSummary[["readQualityScore"]]["quality"] > 29)
>> a = cbind( qaSummary[["readQualityScore"]][idx,"quality"] ,
>> qaSummary[["readQualityScore"]][idx,"density"])
>> a #reads with a quality >Q29
>>
>> #How to get the total number ? or percent compared to the total number
>> of reads ?
>>
>> thanks
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list