[BioC] shortread quality

Martin Morgan mtmorgan at fhcrc.org
Wed Jun 6 21:11:43 CEST 2012


Hi,

On 06/06/2012 08:00 AM, David martin wrote:
> Hi,
> I'm reading a fastq file from the solexa sequencer.
> I would like to know how many reads have a phred score (>Q29). The thing

If you mean the average base quality score, then

   fq <- readFastq(sp, fqpattern)
   score <- alphabetScore(fq)

gives the sum of the base quality scores for each read, so is a vector 
as long as the length of the reads. The average is

   aveScore <- score / width(fq)

and then you're in the realm of familiar R again, e.g.,

   hist(aveScore)
   table(aveScore > 29)

etc.

Hope that heps,

Martin



> is that i get the densities so i don't really know how many reads from
> the total pass that filter. It's probaly easy for you so any hint would
> be helpful
>
> library("ShortRead")
> fqpattern <- "1102sdd_SN148_A_s_3_seq_GJH-85.txt"
>
> path = getwd()
> sp <- SolexaPath(path,dataPath=path,analysisPath=path)
>
> # Read fastq File and save report
> fq <- readFastq(sp, fqpattern)
> qaSummary <- qa(fq,fqpattern)
> save(qaSummary, file=file.path("./", paste(fqpattern,".rda",sep="" )))
> report(qaSummary,dest="report")
>
> #Quality
>
> idx = which(qaSummary[["readQualityScore"]]["quality"] > 29)
> a = cbind( qaSummary[["readQualityScore"]][idx,"quality"] ,
> qaSummary[["readQualityScore"]][idx,"density"])
> a #reads with a quality >Q29
>
> #How to get the total number ? or percent compared to the total number
> of reads ?
>
> thanks
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list