[BioC] Normalized microarray data and meta-analysis

Thu Dec 18 00:30:22 CET 2008

Scott and Thomas,

First of all, thanks so much for your prompt replies!  I'm sorry I
didn't include more details about precisely what I was trying to do.

What I'm trying to do is find those genes that are "consistently
differentially expressed" in my experimental condition of interest.  To
do this, I'm largely following the approach of Mulligan et.al., 2006,
PNAS 103(16), 6368-73.  They calculated an effect size (Cohen's
d-statistic, which is the t-statistic for untreated vs. treated
comparison times 2 and divided by the square root of the degrees of
freedom) for all genes in multiple different experiments, and then took
the average d-statistic across all experiments and used a z-test to
determine if the mean effect size was not equal to 0.  Following
multiple testing adjustment, those with a p-value of <0.05 were
considered consistently differentially expressed.

Do you think I need raw data for this?  Unfortunately, one of the groups
whose experiment I'm trying to use have lost their raw data, precluding
me from having the raw data for all experiments. I understand that I'm
making assumptions about the quality of the arrays; but apart from that,
do you think this is a reasonable approach?

Thanks again in advance,

Wyatt

K. Wyatt McMahon, Ph.D.
Texas Tech University Health Sciences Center
Department of Internal Medicine
3601 4th St. 
Lubbock, TX - 79430
806-743-4072
"It's been a good year in the lab when three things work. . . and one of
those is the lights." - Tom Maniatis

> -----Original Message-----
> From: Thomas Hampton [mailto:Thomas.H.Hampton at Dartmouth.edu]
> Sent: Wednesday, December 17, 2008 5:02 PM
> To: Mcmahon, Kevin
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] Normalized microarray data and meta-analysis
> 
> The question, I think, has to do with what sort of comparisons you
> plan to
> make. When people normalize using RMA, each slide ends up with a
common
> distribution -- the only variable being how the elements of the
> distribution map
> to probes on any given slide. This is already some pretty hairy
> normalization,
> but it seems to work ok for lining up arrays done by the same people
> at the same
> time and place so that you can meaningfully compare expression values
> head to
> head, calculate averages, and do significance tests.
> 
> With or without raw data, the idea of a meaningful direct comparisons
> between of say, an
> expression value of 7.5 in one lab with an expression value of 8.3 in
> another
> seem very optimistic to me.
> 
> Saying something like gene X was in the top 1% in expression in both
> cases seems
> pretty reasonable...
> 
> Tom
> 
> 
> On Dec 17, 2008, at 5:31 PM, Mcmahon, Kevin wrote:
> 
> > Hello Bioconductor-inos,
> >
> >
> >
> > I have more of a statistical/philosophical question regarding using
> > raw
> > vs. normalized data in a microarray meta-analysis.  I've looked
> > through
> > the bioconductor archives and have found some addressing of this
> > issue,
> > but not exactly what I'm concerned with.  I don't mean to waste
> > anyone's
> > time, but I was hoping I could get some help here.
> >
> >
> >
> > I've performed a meta-analysis using the downloaded data from 3
> > different GEO data sets (GDS).  It is my understanding that these
are
> > normalized data from the various microarray experiments.  Seems to
me
> > that the  data from those normalized results are normally
> distributed,
> > those three experiments are perfectly comparable (if you think the
> > author's respective normalization approaches  were reasonable).
> > All you
> > need to do is calculate some sort of effect size/determine a
> > p-value/etc. for all genes in the experimental conditions of
interest
> > and then combine these statistics across the different experiments.
> > However, I consistently read things like "raw data are required for
a
> > microarray meta-analysis."  Does this mean that normalized data are
> > not
> > directly comparable with eachother?  If so, then why does GEO even
> > host
> > such data?
> >
> >
> >
> > Any help would be wonderful!
> >
> >
> >
> > Wyatt
> >
> >
> >
> > K. Wyatt McMahon, Ph.D.
> >
> > Texas Tech University Health Sciences Center
> >
> > Department of Internal Medicine
> >
> > 3601 4th St.
> >
> > Lubbock, TX - 79430
> >
> > 806-743-4072
> >
> > "It's been a good year in the lab when three things work. . . and
> > one of
> > those is the lights." - Tom Maniatis
> >
> >
> >
> >
> > 	[[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives: http://news.gmane.org/
> > gmane.science.biology.informatics.conductor