[BioC] background correcting an eset from GEO query

Sean Davis seandavi at gmail.com
Wed Jan 6 02:41:30 CET 2010


On Tue, Jan 5, 2010 at 7:24 PM, jeremy wilson <jeremy.wilson88 at gmail.com> wrote:
> Dear BioConductors,
>
> I am using GEO query to get valuable datasets from GEO database for my
> analysis. Most of the datasets I require have not submitted raw data
> and I have to rely on the SOFT files to get the expression set
> directly using the "getGEO" command. When I plot the intensities of
> the expression values, I see that none of them I tired are given
> preprocessed (background corrected, normalized) as I see negative
> expression values and not normalized or log transformed data (my
> apologies if I am wrong).
>
> for example:
> gse<-getGEO("GSE1984")
> e<-exprs(gse$GSE1984_series_matrix.txt.gz)
> hist(e[,1], main="histogram of expression values", xlab="Untranformed
> expression values")
> elog=log2(e)
> hist(elog[,1], main="histogram of expression values", xlab="log
> tranformed expression values")
>
> esetOrig<-gse$GSE1984_series_matrix.txt.gz
> hist(esetOrig)
> We can clearly see that from the histograms that the arrays are not
> normalized. The same is true for GSE4465 dataset and etc.

The data are described on the GEO website.  For example, see:

http://www.ncbi.nlm.nih.gov/projects/geo/query/acc.cgi?acc=GSM35348

Note that the VALUE column is described as "Affymetrix Signal".  In
this particular case, you would probably still need to contact the
original investigator to know exactly what this means, but you may
correct that this may not represent an adequately normalized value.

However, the raw data ARE available for GSE1984.  You can get them like so:

getGEOSuppFiles('GSE1984')

This will download a tar file full of .CEL files.  You can use the
normal bioconductor affy tools to work with the data and then
transplant the phenodata from the ExpressionSet you created above to
the resulting new ExpressionSet.

> I am assuming the data we get in the expression set from getGEO for
> datasets like these are hence just the RAW intensity values summarized
> at probeset level some how but not bg corrected and normalized between
> arrays. I would hence like to do these steps one by one on the eset. I
> search the web for packages that do bg correction and normalization on
> eset. I did find the  normalize.ExpressionSet but could not find a bg
> correction method for eset. I think it may not be possible to do a bg
> correction on eset as there is no spatial positional information for
> probes or probesets in eset unlike affybatch object to do a bg
> correction.
> In case there is no bg correction method for an eset, please suggest
> me how to proceed from an eset from GEO query to a bg corrected,
> normalized eset.

Unfortunately, there is no standard for GSE records, except that the
values are _supposed_ to be normalized in some fashion by the
investigators.  In most cases, they are, but that may not mean that
they would be normalized the same way if done by another person.  You
can either use the values in the GSE record (not the GSEMatrix), if
those values allow you to renormalize, or you will need to download
the raw data.  If neither is available, then you are stuck writing to
the authors and hoping for the best.  As a note, GDS records are truly
normalized (GEO checks this), so those are generally a good bet if a
GDS is available.

Hope that helps.

Sean

> I would greatly appreciate your help. Thank you
>
> SessionInfo()
>
> R version 2.10.0 (2009-10-26)
> i386-pc-mingw32
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
> States.1252
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] tools     stats     graphics  grDevices datasets  utils
> methods   base
>
> other attached packages:
>  [1] convert_1.21.1       marray_1.23.0        geneplotter_1.23.3
> lattice_0.17-26
>  [5] annotate_1.23.4      AnnotationDbi_1.7.20 genefilter_1.26.4
> affyPLM_1.22.0
>  [9] preprocessCore_1.7.9 gcrma_2.17.4         affy_1.23.12
> GEOquery_2.11.2
> [13] RCurl_1.2-1          bitops_1.0-4.1       limma_3.0.3
> Biobase_2.5.8
>
> loaded via a namespace (and not attached):
>  [1] affyio_1.13.5      Biostrings_2.13.54 DBI_0.2-4
> grid_2.10.0        IRanges_1.3.99
>  [6] RColorBrewer_1.0-2 RSQLite_0.7-3      splines_2.10.0
> survival_2.35-7    xtable_1.5-5
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list