[BioC] R crashes with GEOmetadb

Sean Davis sdavis2 at mail.nih.gov
Thu Jun 30 15:26:43 CEST 2011


On Thu, Jun 30, 2011 at 8:50 AM, Hooiveld, Guido <Guido.Hooiveld at wur.nl> wrote:
> Hi Sean,
> Indeed, you are correct!
> Due to my inexperience with performing database queries, and clumsy interpretation of some example code I inadvertently closed the connection to the database... Well, after omitting this line the example is working fine now! :)
>
> One thing though,  through GEOmetadb I locate 17751 CEL files for GPL96, whereas a query directly @ GEO indicates it hosts a considerably larger number of these arrays (i.e. Samples (28011)). Any idea what may cause this discrepancy?
> http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL96

If you change your query to "%CEL%" rather than "%CEL.gz", you will
pick up another 4k samples, but there are still about 6k samples
without CEL files.  GEO has not always required raw data.

As an aside, the GEOmetadb database is update often, but not
continuously, so there will be a bit of a lag (sample numbers may be
lower in GEOmetadb than in GEO proper).

Sean


> Thanks again for your assistance,
> Guido
>
> -----Original Message-----
> From: seandavi at gmail.com [mailto:seandavi at gmail.com] On Behalf Of Sean Davis
> Sent: Thursday, June 30, 2011 14:03
> To: Hooiveld, Guido
> Cc: bioconductor (bioconductor at stat.math.ethz.ch); Seth Falcon
> Subject: Re: [BioC] R crashes with GEOmetadb
>
> See below.
>
> On Wed, Jun 29, 2011 at 11:36 AM, Hooiveld, Guido <Guido.Hooiveld at wur.nl> wrote:
>> Dear Sean and others,
>>
>> I am exploring the functionality of 'GEOmetadb'. I am specifically interested in downloading all CEL files performed on a certain platform.
>> To this end I am using the example mentioned in the vignette of GEOmetadb, which should retrieve the number of GEO entries and CEL files performed on the Affymetrix array HGU133A (page 8 vignette).
>> However, when executing that code R crashes and needs to exit...
>> To me the error messages are not informative to me, but may be you can deduce what is going wrong. Any feedback is appreciated.
>>
>> Regards,
>> Guido
>>
>>
>> R version 2.13.0 (2011-04-13)
>> Copyright (C) 2011 The R Foundation for Statistical Computing ISBN
>> 3-900051-07-0
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>>>
>>> library(GEOmetadb)
>> Loading required package: GEOquery
>> Loading required package: Biobase
>>
>> Welcome to Bioconductor
>>
>>  Vignettes contain introductory material. To view, type
>>  'browseVignettes()'. To cite Bioconductor, see
>>  'citation("Biobase")' and for packages 'citation("pkgname")'.
>>
>> Setting options('download.file.method.GEOquery'='curl')
>> Loading required package: RSQLite
>> Loading required package: DBI
>>> getSQLiteFile()
>> trying URL 'http://gbnci.abcc.ncifcrf.gov/geo/GEOmetadb.sqlite.gz'
>> Content type 'text/plain; charset=ISO-8859-1' length 109446149 bytes
>> (104.4 Mb) opened URL ================================================
>> downloaded 104.4 Mb
>>
>> Unzipping...
>> Metadata associate with downloaded file:
>>                name               value
>> 1     schema version                 1.0
>> 2 creation timestamp 2011-06-18 09:50:00 [1]
>> "/home.local/guidoh/GEOmetadb.sqlite"
>>>
>>> con <- dbConnect(SQLite(), "GEOmetadb.sqlite")
>>> dbDisconnect(con)
>
> Sorry, Guido.  I missed this point in my first pass through your email.  Here, you disconnect the connection.
>
>> [1] TRUE
>>>
>>> rs <- dbGetQuery(con,paste("select gsm,supplementary_file",
>> +                            "from gsm where gpl='GPL96'",
>> +                            "and supplementary_file like '%CEL.gz'"))
>
> Here, you are using a disconnected connection object (con) to perform the query; it should fail with an error message but probably not a segmentation fault.  If you DO NOT disconnect the connection object, this query works fine.  Perhaps RSQLite should have a check of the connection object to make sure that it is connected to avoid the segmentation fault?
>
> Sean
>
>
>> sessionInfo()
> R version 2.13.0 Under development (unstable) (2011-02-26 r54608)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] RSQLite_0.9-4 DBI_0.2-5
>
>
>> *** caught segfault ***
>> address 0x8, cause 'memory not mapped'
>>
>> Traceback:
>> 1: .Call("RS_SQLite_exec", conId, statement, bind.data, PACKAGE =
>> .SQLitePkgName)
>> 2: sqliteExecStatement(con, statement, bind.data)
>> 3: sqliteQuickSQL(conn, statement, ...)
>> 4: dbGetQuery(con, paste("select gsm,supplementary_file", "from gsm
>> where gpl='GPL96'",     "and supplementary_file like '%CEL.gz'"))
>> 5: dbGetQuery(con, paste("select gsm,supplementary_file", "from gsm
>> where gpl='GPL96'",     "and supplementary_file like '%CEL.gz'"))
>>
>> Possible actions:
>> 1: abort (with core dump, if enabled)
>> 2: normal R exit
>> 3: exit R without saving workspace
>> 4: exit R saving workspace
>> Selection: dim(rs)
>> Selection:
>>
>>
>> ---------------------------------------------------------
>> Guido Hooiveld, PhD
>> Nutrition, Metabolism & Genomics Group Division of Human Nutrition
>> Wageningen University Biotechnion, Bomenweg 2
>> NL-6703 HD Wageningen
>> the Netherlands
>> tel: (+)31 317 485788
>> fax: (+)31 317 483342
>> email:      guido.hooiveld at wur.nl
>> internet:   http://nutrigene.4t.com
>> http://www.researcherid.com/rid/F-4912-2010
>>
>>
>>        [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>



More information about the Bioconductor mailing list