[BioC] fastq upload time

Marc Noguera mnoguera at imppc.org
Wed Sep 22 16:59:38 CEST 2010

If you want to check quality, and base-position/frequency I would
suggest random sampling on the fastq file to extract let's say a 5-10%
of the file and run Quality analysis on it.

Though global numbers won't be real most of the information will be
still informative with less memory/CPU.

I wrote some c code for this if you are interested.

Alex Gutteridge wrote:
> On Wed, 22 Sep 2010 08:12:27 -0400, Sean Davis <sdavis2 at mail.nih.gov>
> wrote:
>> On Wed, Sep 22, 2010 at 8:07 AM, <Daniel.Berner at unibas.ch> wrote:
>>> Hi there
>>> I have a solexa fastq file containing some 27 million reads (file size
>>> approx. 4 GB). my plan is to upload this into R for subsequent editing
>>> with
>>> ShortRead tools. The R version is 64-bit linux, the computer has 8 GB
>>> RAM.
>>> Can anybody provide a rough estimate of how long the input will take?
>>> hours,
>>> days...?
>> Depending on disk and network speeds, perhaps a few minutes.  8GB is
> pretty
>> small, though.  You'll have to give it a try to see if it all fits into
>> memory.
>> Sean
> Yes, my experience with ShortRead and files this size was that 8GB was not
> enough. If it is compatible with your planned analysis I would split the
> file according to chromosome and work from those.

Marc Noguera i Julian, PhD
Genomics unit / Bioinformatics
Institut de Medicina Predictiva i Personalitzada
del Càncer (IMPPC)
B-10 Office
Carretera de Can Ruti
Camí de les Escoles s/n
08916 Badalona, Barcelona

More information about the Bioconductor mailing list