[BioC] background correcting an eset from GEO query

jeremy wilson jeremy.wilson88 at gmail.com
Wed Jan 6 01:24:25 CET 2010


Dear BioConductors,

I am using GEO query to get valuable datasets from GEO database for my
analysis. Most of the datasets I require have not submitted raw data
and I have to rely on the SOFT files to get the expression set
directly using the "getGEO" command. When I plot the intensities of
the expression values, I see that none of them I tired are given
preprocessed (background corrected, normalized) as I see negative
expression values and not normalized or log transformed data (my
apologies if I am wrong).

for example:
gse<-getGEO("GSE1984")
e<-exprs(gse$GSE1984_series_matrix.txt.gz)
hist(e[,1], main="histogram of expression values", xlab="Untranformed
expression values")
elog=log2(e)
hist(elog[,1], main="histogram of expression values", xlab="log
tranformed expression values")

esetOrig<-gse$GSE1984_series_matrix.txt.gz
hist(esetOrig)
We can clearly see that from the histograms that the arrays are not
normalized. The same is true for GSE4465 dataset and etc.

I am assuming the data we get in the expression set from getGEO for
datasets like these are hence just the RAW intensity values summarized
at probeset level some how but not bg corrected and normalized between
arrays. I would hence like to do these steps one by one on the eset. I
search the web for packages that do bg correction and normalization on
eset. I did find the  normalize.ExpressionSet but could not find a bg
correction method for eset. I think it may not be possible to do a bg
correction on eset as there is no spatial positional information for
probes or probesets in eset unlike affybatch object to do a bg
correction.
In case there is no bg correction method for an eset, please suggest
me how to proceed from an eset from GEO query to a bg corrected,
normalized eset.

I would greatly appreciate your help. Thank you

SessionInfo()

R version 2.10.0 (2009-10-26)
i386-pc-mingw32

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] tools     stats     graphics  grDevices datasets  utils
methods   base

other attached packages:
 [1] convert_1.21.1       marray_1.23.0        geneplotter_1.23.3
lattice_0.17-26
 [5] annotate_1.23.4      AnnotationDbi_1.7.20 genefilter_1.26.4
affyPLM_1.22.0
 [9] preprocessCore_1.7.9 gcrma_2.17.4         affy_1.23.12
GEOquery_2.11.2
[13] RCurl_1.2-1          bitops_1.0-4.1       limma_3.0.3
Biobase_2.5.8

loaded via a namespace (and not attached):
 [1] affyio_1.13.5      Biostrings_2.13.54 DBI_0.2-4
grid_2.10.0        IRanges_1.3.99
 [6] RColorBrewer_1.0-2 RSQLite_0.7-3      splines_2.10.0
survival_2.35-7    xtable_1.5-5



More information about the Bioconductor mailing list