[BioC] GEOmetadb query to retrieve sample groups

Sean Davis sdavis2 at mail.nih.gov
Mon Jun 10 18:24:06 CEST 2013


Hi, Tom.

Sorry to take so long to get back to you.  See below.

On Thu, Jun 6, 2013 at 11:15 AM, Thomas H. Hampton
<Thomas.H.Hampton at dartmouth.edu> wrote:
> The following getGEO query retrieves data files and meta data for a recent GEO submission of mine,
> one that has been curated:
>
> GDS4252 <- getGEO("GDS4252")
> Columns(GDS4252)
>> str(Columns(GDS4252))
> 'data.frame': 16 obs. of  4 variables:
> $ sample            : Factor w/ 16 levels "GSM754979","GSM754980",..: 5 6 7 8 1 2 3 4 13 14 ...
> $ genotype/variation: Factor w/ 2 levels "CFTR  mutant",..: 1 1 1 1 1 1 1 1 2 2 ...
> $ agent             : Factor w/ 2 levels "PA01","unexposed": 1 1 1 1 2 2 2 2 1 1 ...
>
> The folks at NCBI have correctly created two factors with two levels to describe the 16 samples in my experiment.
>
> I am interested in retrieving similar information using GEOmetadb, but this has proved problematic.
>
> getSQLiteFile(destdir = getwd(), destfile = "GEOmetadb.sqlite.gz")
>
> con <- dbConnect(SQLite(), "GEOmetadb.sqlite")
> dat <- dbGetQuery(con, "select * from gds where gds = 'GDS4252'")
>
>> dat
>  [1] ID                       gds                      title
>  [4] description              type                     pubmed_id
>  [7] gpl                      platform_organism        platform_technology_type
> [10] feature_count            sample_organism          sample_type
> [13] channel_count            sample_count             value_type
> [16] gse                      order                    update_date
> <0 rows> (or 0-length row.names)
>
> It seems, for starters, that this GDS identifier for my particular submission isn't accounted for in the current
> database.
>
> Others are, so it looks like my syntax and so forth is ok:
>
>> dat <- dbGetQuery(con, "select gds from gds limit 10")
>> dat
>      gds
> 1   GDS5
> 2   GDS6
> 3  GDS10
> 4  GDS12
> 5  GDS15
> 6  GDS16
> 7  GDS17
> 8  GDS18
> 9  GDS19
> 10 GDS20
>
>
> There is also the question of where a set of fields (variable in number) describing sample factors and their levels would actually "live"
> in the SQLite database.

It does appear that our update script has a bug; GDS4252 is not
present, so we'll check on that.

> This information does not seem to be an attribute of the GDS in any case:

You'll want to check out the gds_subset table for details of the GDS groups.

>> dat <- dbGetQuery(con, "select fieldname from geodb_column_desc where TableName = 'gds'")
>> dat
>                   FieldName
> 1                        ID
> 2             channel_count
> 3               description
> 4             feature_count
> 5                       gds
> 6                     order
> 7                  platform
> 8         platform_organism
> 9  platform_technology_type
> 10                pubmed_id
> 11         reference_series
> 12             sample_count
> 13          sample_organism
> 14              sample_type
> 15                    title
> 16                     type
> 17              update_date
> 18               value_type
>
> Nor does it seem to be a feature stored in the samples:
>
>> dat <- dbGetQuery(con, "select fieldname from geodb_column_desc where TableName = 'gsm'")
>> dat
>                 FieldName
> 1                      ID
> 2           channel_count
> 3     characteristics_ch1
> 4     characteristics_ch2
> 5                 contact
> 6         data_processing
> 7          data_row_count
> 8             description
> 9    extract_protocol_ch1
> 10   extract_protocol_ch2
> 11                    gpl
> 12                    gse
> 13                    gsm
> 14           hyb_protocol
> 15              label_ch1
> 16              label_ch2
> 17     label_protocol_ch1
> 18     label_protocol_ch2
> 19       last_update_date
> 20           molecule_ch1
> 21           molecule_ch2
> 22           organism_ch1
> 23           organism_ch2
> 24        source_name_ch1
> 25        source_name_ch2
> 26                 status
> 27        submission_date
> 28     supplementary_file
> 29                  title
> 30 treatment_protocol_ch1
> 31 treatment_protocol_ch2
> 32                   type
>
>
> Any advice greatly appreciated.



More information about the Bioconductor mailing list