[BioC] fastq upload time

Marc Noguera mnoguera at imppc.org
Wed Sep 22 16:59:38 CEST 2010


If you want to check quality, and base-position/frequency I would
suggest random sampling on the fastq file to extract let's say a 5-10%
of the file and run Quality analysis on it.

Though global numbers won't be real most of the information will be
still informative with less memory/CPU.

I wrote some c code for this if you are interested.

marc
Alex Gutteridge wrote:
> On Wed, 22 Sep 2010 08:12:27 -0400, Sean Davis <sdavis2 at mail.nih.gov>
> wrote:
>   
>> On Wed, Sep 22, 2010 at 8:07 AM, <Daniel.Berner at unibas.ch> wrote:
>>
>>     
>>> Hi there
>>> I have a solexa fastq file containing some 27 million reads (file size
>>> approx. 4 GB). my plan is to upload this into R for subsequent editing
>>> with
>>> ShortRead tools. The R version is 64-bit linux, the computer has 8 GB
>>> RAM.
>>> Can anybody provide a rough estimate of how long the input will take?
>>> hours,
>>> days...?
>>>
>>>       
>> Depending on disk and network speeds, perhaps a few minutes.  8GB is
>>     
> pretty
>   
>> small, though.  You'll have to give it a try to see if it all fits into
>> memory.
>>
>> Sean
>>     
>
> Yes, my experience with ShortRead and files this size was that 8GB was not
> enough. If it is compatible with your planned analysis I would split the
> file according to chromosome and work from those.
>
>   


-- 
-----------------------------------------------------
Marc Noguera i Julian, PhD
Genomics unit / Bioinformatics
Institut de Medicina Predictiva i Personalitzada
del Càncer (IMPPC)
B-10 Office
Carretera de Can Ruti
Camí de les Escoles s/n
08916 Badalona, Barcelona



More information about the Bioconductor mailing list