[BioC] qrqc with variable length of short reads? - readSeqFile could not handle a 2GB zipped file.

Fri Jun 1 22:55:53 CEST 2012

Hi SangChul,

By default readSeqFile hashes a proportion of the reads to check against many being non-unique. Specify hash=FALSE to turn this off and your memory usage will decrease.

Best,
Vince

Sent from my iPhone

On Jun 1, 2012, at 1:23 PM, Sang Chul Choi <schoi at cornell.edu> wrote:

> Hi,
> 
> I am using qrqc to plot base quality of a short read fastq file. When the FASTQ file has short reads of the same length, the readSeqFile could read in the FASTQ file (25 millions of 100bp reads) with a couple of GB of memory. I trimmed 3' end of the short reads, which would lead to short reads of variable length because of different base quality at the 3' end.  Then, I tried to read in this second FASTQ file of reads of variable length.  It used up all of the 16 GB memory, and not using CPUs at all.  It seems there are some efficient code in readSeqFile as mentioned in the readSeqFile help message.  It seems to fall apart when short reads are of different size.
> 
> I wish to see how the trimming change the base-quality plots, and this is a problem.  I am wondering if there is a way of sidestepping this problem.
> 
> Thank you,
> 
> SangChul
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor