[BioC] How to use GEOquery to extract more than the default information from a GSE

James F. Reid james.reid at ifom-ieo-campus.it
Fri Jul 24 15:38:31 CEST 2009

Hi Marco,

I'm not sure what you mean by 'more than default information'.

Using GEOquery can be a bit complicated if the GEO series (GSE) contains 
multiple platforms, but in your case you're fine because there is only one.

If you can get a complete ExpressionSet which stores samples annotation, 
platform annotation and expression values by doing:

gse <- getGEO("GSE9820")
##[1] "GSE9820_series_matrix.txt.gz"

which prints out:
ExpressionSet (storageMode: lockedEnvironment)
assayData: 20589 features, 153 samples
   element names: exprs
   sampleNames: GSM247703, GSM247704, ..., GSM247855  (153 total)
   varLabels and varMetadata description:
     title: NA
     geo_accession: NA
     ...: ...
     data_row_count: NA
     (33 total)
   featureNames: ILMN_10000, ILMN_10001, ..., ILMN_9999  (20589 total)
   fvarLabels and fvarMetadata description:
     ID: NA
     GB_ACC: NA
     ...: ...
     (6 total)
   additional fvarMetadata: Column, Description
experimentData: use 'experimentData(object)'
Annotation: GPL6255

[1] "ID"         "GB_ACC"     "SYMBOL"     "DEFINITION" "ONTOLOGY"

contains all the information for the platform, varLabels will give you 
the labels of the sample information and you can get to the expression 
values by means of exprs(gse[[1]]).


Manca Marco (PATH) wrote:
> Dear Sean and dear bioconductors,
> I am writing you to ask a source of inspiration (code pieces, notes, references, whatever you might think appropriate) to import array annotation and other data from the GSE I am trying to work with (namely the GSE9820) into my eset.
> I have read on GEOquery's vignette that this is actually possible, despite being a bit tricky:
> "So, using a combination of lapply on the GSMList, one can extract as many columns of interest as necessary to build the data structure of choice. Because the GSM data from the GEO website are fully downloaded and included in the GSE object, one can extract foreground and background as well as quality for two-channel arrays, for example. Getting array annotation is also a bit more complicated, but by replacing \platform" in the lapply call to get platform information for each array, one can get other information associated with each array. Future work with this package will likely focus on better tools for manipulating GSE data" From http://www.bioconductor.org/packages/2.4/bioc/vignettes/GEOquery/inst/doc/GEOquery.pdf Page 22 of 22
> ...but I can't find anywhere any hint.
> Thank you in advance for your patience and support.
> My best regards,
> Marco

More information about the Bioconductor mailing list