[BioC] qrqc with variable length of short reads? - readSeqFile could not handle a 2GB zipped file.

Thomas Girke thomas.girke at ucr.edu
Sat Jun 2 20:19:17 CEST 2012


Dear Vince,

Have you thought about supporting ShortReadQ objects from ShortRead in
your package. This way users could random sample reads from large fastq
files with FastqSampler() which would reduce the memory requirements and
speed things up to generate the really nice and useful quality plots of
your package. Right this seems to be only possible by saving things back
to files (random sample with ShortRead -> save to file -> reload with
qrqc) which is not ideal, but perhaps there is a simpler solution to 
this already that I missed?

Thomas

On Fri, Jun 01, 2012 at 08:55:53PM +0000, Vince Buffalo wrote:
> Hi SangChul,
> 
> By default readSeqFile hashes a proportion of the reads to check against many being non-unique. Specify hash=FALSE to turn this off and your memory usage will decrease.
> 
> Best,
> Vince
> 
> Sent from my iPhone
> 
> On Jun 1, 2012, at 1:23 PM, Sang Chul Choi <schoi at cornell.edu> wrote:
> 
> > Hi,
> > 
> > I am using qrqc to plot base quality of a short read fastq file. When the FASTQ file has short reads of the same length, the readSeqFile could read in the FASTQ file (25 millions of 100bp reads) with a couple of GB of memory. I trimmed 3' end of the short reads, which would lead to short reads of variable length because of different base quality at the 3' end.  Then, I tried to read in this second FASTQ file of reads of variable length.  It used up all of the 16 GB memory, and not using CPUs at all.  It seems there are some efficient code in readSeqFile as mentioned in the readSeqFile help message.  It seems to fall apart when short reads are of different size.
> > 
> > I wish to see how the trimming change the base-quality plots, and this is a problem.  I am wondering if there is a way of sidestepping this problem.
> > 
> > Thank you,
> > 
> > SangChul
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list