[BioC] GEOquery: how to extract experimental data? (confused)

Sean Davis sdavis2 at mail.nih.gov
Tue Aug 16 13:36:41 CEST 2011


On Tue, Aug 16, 2011 at 7:20 AM,  <J.delasHeras at ed.ac.uk> wrote:
>
> I have been until now downloading GEO data directly to my computer and using
> basic R functions to load tables and process them.
> It works, but I figured I would probably save time if I learn to use the
> GEOquery package, which looks promising.
>
> However, I'm failing tremendously at my first attempt. I can get a lot of
> good information, except the actual experiment data... and it seems to be
> there, but can't get to it!
>
> Example. I'm trying to get GSE19044, which contains 42 samples and uses the
> Illumina WG6 platform, which is great as I'm familiar with it.
>
> so I do:
>
> library(GEOquery)
> u = getGEO('GSE19044')
> show(u)
>
>> show(u)
>
> $GSE19044_series_matrix.txt.gz
> ExpressionSet (storageMode: lockedEnvironment)
> assayData: 45281 features, 42 samples
>  element names: exprs
> protocolData: none
> phenoData
>  sampleNames: GSM471318, GSM471319, ..., GSM471359  (42 total)
>  varLabels and varMetadata description:
>    title: NA
>    geo_accession: NA
>    ...: ...
>    data_row_count: NA
>    (39 total)
> featureData
>  featureNames: ILMN_1212602, ILMN_1212603, ..., ILMN_3163582  (45281 total)
>  fvarLabels and fvarMetadata description:
>    ID: NA
>    Species: NA
>    ...: ...
>    SPOT_ID: NA
>    (31 total)
>  additional fvarMetadata: Column, Description
> experimentData: use 'experimentData(object)'
> Annotation: GPL6887
>
> It looks good. It looks like what I want is the 'assayData'. But I can't get
> to it.
>
> 'u' is a list, containing one element...
>>
>> class(u)
>
> [1] "list"
>>
>> length(u)
>
> [1] 1
>
>> class(u[[1]])
>
> [1] "ExpressionSet"
> attr(,"package")
> [1] "Biobase"
>
> ok, so I rename that, and look at its structure:
>
> eset<-u[[1]]
> str(eset)
>
>> str(eset)
>
> Formal class 'ExpressionSet' [package "Biobase"] with 7 slots
>  ..@ assayData        :<environment: 0x0645ec5c>
>  ..@ phenoData        :Formal class 'AnnotatedDataFrame' [package "Biobase"]
> [...] (omitted for brevity)
>
> I can extract the sample names, the basic annotation/probe identity etc
> easily:
> eset at phenoData@data #samples
> eset at featureData@data #annotation
>
> but how do I get into 'assayData'?
> from the 'show(u)' it looks like it contains what I am after: 45281
> features, 42 samples ... but it's class 'environment' and that's throwing me
> off.
>
> I was looking into the GEOquery user guide, but I'm still none the wiser.

Hi, Jose.

Sorry this was confusing for you.  Your eset object above is an
ExpressionSet and is one of the standard classes for storing gene
expression data in Bioconductor; GEOquery uses this class where
possible to store GEO data so as to facilitate downstream processing
with other Bioconductor packages.  Typically, you can get the
expression data from an ExpressionSet by doing:

assayDataElement(eset,'exprs')

or the simpler shorthand:

exprs(eset)

Similarly, to get the sample variables, you can do:

pData(eset)

To get more help on ExpressionSet, you can do
help("ExpressionSet-class") and read the related Biobase vignette.

I hope that clears things up.

Sean



More information about the Bioconductor mailing list