[BioC] Normalization of array data from GEO repository
James F. Reid
james.reid at ifom-ieo-campus.it
Tue Jul 14 11:47:09 CEST 2009
care: this is my understanding and I might be quite wrong.
There is indeed no synchronization between the two databases for lack of
a common standard (each have their own flavour of MAGE-ML).
In addition to investigators submitting to both repositories,
ArrayExpress also imports experiments from GEO according to certain
criteria. These are prefixed by 'E-GEOD' in the experiment ID. Querying
ArrayExpress for these returns 5155 such experiments out of a total of
8372. GEO contains 12810 Series (experiments), so GEO does contain more
data I would say.
Sean Davis wrote:
> On Wed, Jul 8, 2009 at 6:16 AM, Joern Toedling <Joern.Toedling at curie.fr>wrote:
>> just a small addendum: you may also want to have a look at the ArrayExpress
>> package which allows the user to retrieve data sets from the ArrayExpress
>> database at EBI and returns the data in form of an AffyBatch, NChannelSet,
>> RGList or the like. Since GEO and ArrayExpress are regularly synchronized,
>> may be able to find your data sets of interest there as well.
> Actually, ArrayExpress and GEO are NOT synchronized. There are some
> overlaps where investigators have submitted to both and for other reasons,
> but GEO is still the larger of the two and they each contain largely
> non-overlapping data sets.
>> On Tue, 7 Jul 2009 13:59:19 -0400, Steve Lianoglou wrote
>>> On Jul 7, 2009, at 5:38 AM, [WINDOWS-1252?]AleÅ¡ Maver wrote:
>>>> Hi all,
>>>> I have obtained several GEO Series (GSE) entries from GEO repository
>>>> getGEO function (GEOquery package).
>>>> Data obtained in this manner is stored in ExpressionSet class. The
>>>> is I don't know how to perform quality control analyses and
>>>> procedures on ExpressionSet data, because functions like expresso
>>>> package) work only on AffyBatch classes. Is there anything I am
>>> Sorry, I've never used the GEOquery package before, so I can't speak
>>> much to that, but I'd be surprised if there isn't an option to
>>> return your results as an AffyBatch object, because I'd dare say
>>> that you can get most of the data from geo in its raw format (eg,
>>> CEL file or whatever).
>>>> And- does anyone know whether data in GEO repository is already
>>>> or not?
>>> It depends, sometimes you aren't given the raw files: sometimes the
>>> data is from a custom array, or I've also seen some datasets
>>> provided in the post-processed form (already MAS5 normalized, for
>>> example), but it's been my experience that you can get the raw data
>>> for most of the experiments you find there.
>>> Also, for array quality assessment, look into the
>>> arrayQualityMetrics package:
>>> Hope that helps,
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> Search the archives:
> [[alternative HTML version deleted]]
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor