[BioC] fastq upload time
mnoguera at imppc.org
Wed Sep 22 16:59:38 CEST 2010
If you want to check quality, and base-position/frequency I would
suggest random sampling on the fastq file to extract let's say a 5-10%
of the file and run Quality analysis on it.
Though global numbers won't be real most of the information will be
still informative with less memory/CPU.
I wrote some c code for this if you are interested.
Alex Gutteridge wrote:
> On Wed, 22 Sep 2010 08:12:27 -0400, Sean Davis <sdavis2 at mail.nih.gov>
>> On Wed, Sep 22, 2010 at 8:07 AM, <Daniel.Berner at unibas.ch> wrote:
>>> Hi there
>>> I have a solexa fastq file containing some 27 million reads (file size
>>> approx. 4 GB). my plan is to upload this into R for subsequent editing
>>> ShortRead tools. The R version is 64-bit linux, the computer has 8 GB
>>> Can anybody provide a rough estimate of how long the input will take?
>> Depending on disk and network speeds, perhaps a few minutes. 8GB is
>> small, though. You'll have to give it a try to see if it all fits into
> Yes, my experience with ShortRead and files this size was that 8GB was not
> enough. If it is compatible with your planned analysis I would split the
> file according to chromosome and work from those.
Marc Noguera i Julian, PhD
Genomics unit / Bioinformatics
Institut de Medicina Predictiva i Personalitzada
del Càncer (IMPPC)
Carretera de Can Ruti
Camí de les Escoles s/n
08916 Badalona, Barcelona
More information about the Bioconductor