[BioC] GEOquery and parsing SOFT files

Wacek Kusnierczyk Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Mon May 25 16:13:21 CEST 2009


Hello,

The getGEO function from GEOquery parses GEO soft files.  With a
particular GSE file (GSE13638), it took over 15 minutes on my
not-so-crappy machine to parse the file (a local file, download time
excluded).  I've written a simple parser in perl, and parsing the same
file and storing the data in a nested hash/array structure takes ca. 2
seconds.  I'm pretty sure there is more essential processing done by
getGEO to organize the data into a GSE object, but still, there seems to
be an incredibly inefficient implementation underneath.

I haven't looked at the source code yet, but here's a question:  what is
the likely reason getGEO is so slow?  Is it the parsing itself, or
rather wraping the data into the appropriate structure?  Where should I
start to look for code to be improved?

vQ



More information about the Bioconductor mailing list