[BioC] Normalization of array data from GEO repository

James F. Reid james.reid at ifom-ieo-campus.it
Tue Jul 14 11:47:09 CEST 2009


Hi,

care: this is my understanding and I might be quite wrong.

There is indeed no synchronization between the two databases for lack of 
a common standard (each have their own flavour of MAGE-ML).
In addition to investigators submitting to both repositories, 
ArrayExpress also imports experiments from GEO according to certain 
criteria. These are prefixed by 'E-GEOD' in the experiment ID. Querying 
ArrayExpress for these returns 5155 such experiments out of a total of 
8372. GEO contains 12810 Series (experiments), so GEO does contain more 
data I would say.

HTH,
James.


Sean Davis wrote:
> On Wed, Jul 8, 2009 at 6:16 AM, Joern Toedling <Joern.Toedling at curie.fr>wrote:
> 
>> Hello,
>>
>> just a small addendum: you may also want to have a look at the ArrayExpress
>> package which allows the user to retrieve data sets from the ArrayExpress
>> database at EBI and returns the data in form of an AffyBatch, NChannelSet,
>> RGList or the like. Since GEO and ArrayExpress are regularly synchronized,
>> you
>> may be able to find your data sets of interest there as well.
>>
> 
> Actually, ArrayExpress and GEO are NOT synchronized.  There are some
> overlaps where investigators have submitted to both and for other reasons,
> but GEO is still the larger of the two and they each contain largely
> non-overlapping data sets.
> 
> 
>> Regards,
>> Joern
>>
>>
>> On Tue, 7 Jul 2009 13:59:19 -0400, Steve Lianoglou wrote
>>> Hi,
>>>
>>> On Jul 7, 2009, at 5:38 AM, [WINDOWS-1252?]Aleš Maver wrote:
>>>
>>>> Hi all,
>>>> I have obtained several GEO Series (GSE) entries from GEO repository
>>>> using
>>>> getGEO function (GEOquery package).
>>>> Data obtained in this manner is stored in ExpressionSet class. The
>>>> problem
>>>> is I don't know how to perform quality control analyses and
>>>> normalization
>>>> procedures on ExpressionSet data, because functions like expresso
>>>> (affy
>>>> package) work only on AffyBatch classes. Is there anything I am
>>>> missing?
>>> Sorry, I've never used the GEOquery package before, so I can't speak
>>>  much to that, but I'd be surprised if there isn't an option to
>>> return  your results as an AffyBatch object, because I'd dare say
>>> that you can  get most of the data from geo in its raw format (eg,
>>> CEL file or  whatever).
>>>
>>>> And- does anyone know whether data in GEO repository is already
>>>> normalised
>>>> or not?
>>> It depends, sometimes you aren't given the raw files: sometimes the
>>> data is from a custom array, or I've also seen some datasets
>>> provided  in the post-processed form (already MAS5 normalized, for
>>> example), but  it's been my experience that you can get the raw data
>>> for most of the  experiments you find there.
>>>
>>> Also, for array quality assessment, look into the
>>> arrayQualityMetrics  package:
>>>
>>>
>> http://www.bioconductor.org/packages/release/bioc/html/arrayQualityMetrics.html
>>> Hope that helps,
>>> -steve
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
> 
> 	[[alternative HTML version deleted]]
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list