[BioC] Using limma to analyze GEO datasets/series from two-channel experiments

Sat Oct 3 00:00:08 CEST 2009

Dear Gurus,
I am attempting to analyze a bunch of microarray experiments from the
GEO database.

I have experience with Affymetrix chips, so it was reasonably simple
to download the datasets/series of interest, retrieve the relevant
columns from the GSM files (figure out whether they were normalized,
logged, etc), and perform the comparisons I need using limma.

Now I am struggling to do the same for other platforms, in particular,
two-color platforms.
The first few such experiments I have looked at look reasonably
simple.  However, I aven't been able to figure out how to obtain a
data structure that lmFit can use from the GSM files.

I decided to try the GEOquery package to interface with GEO.

gse <- getGEO("GSE2998")
exprs <- exprs(gse[[1]])

The exprs matrix now contains the VALUE column from each GSM file,
which in this particular case is "The log2-transformed ratio of the
Lowess-normalized fluorescence values (Ch2/Ch1) exported from
GeneTraffic".

For one of the comparisons that I am interested in, there are two
chips of relevance.
GSM65523, with treated Cy3 and untreated Cy5
GSM65567, with treated Cy3 and untreated Cy5

I thought that the best way to compare treated to untreated would be
something like:
targets <- matrix(c("GSM65523", "noHS", "HS",
                    "GSM65567", "noHS", "HS"), ncol=3, byrow=TRUE,
                  dimnames=list(NULL, c("SlideNumber", "Cy3", "Cy5")))
design <- modelMatrix(targets, ref="noHS")
lmFit(exprs, design)

But, of course, exprs doesn't contain any channel info, just the log
ratio between the channels.
Should I be retrieving different columns from the GSM files? How can I
build a data structure from that data that lmFit can use? Is there a
better way to do simple comparisons of two-channel GEO datasets?

Thank you so much for any help you can provide!
Best,
Ana