[BioC] Almost inexisting overlap of diff. expr. genes found when comparing mas5 / rma

James W. MacDonald jmacdon at med.umich.edu
Fri Jul 8 21:21:40 CEST 2005

Emmanuel Levy wrote:
> Dear Bioconductor community,
> I've been looking for differentially expressed genes in C. elegans after a 
> drug treatment.
> There are 3 replicates of each condition and 2 conditions in total (WT and 
> Drug)
> I used limma combined with either rma or mas5. I find a very very poor 
> overlap in the results:
> - example (i) only 24 of the 100 most differentially expressed genes 
> obtained using rma are found in
> the 1000 most differentially expressed genes obtained using mas5
> - example (ii) only 183 genes are common to the lists of the 1000 most 
> differentially expressed genes
> found using both methods.
> (see piece of code at the end)

Unfortunately, this is a very common result. We recently did a study of 
7 different methods for Affy data, and found very poor overlap in the 
set of significant genes.


One problem with microarray data is the lack of 'true' measurements that 
can be used to objectively assess the results of any given method. 
Instead we are forced to judge the results based on ideas that may not 
be easily defended.

For instance, in the above paper, we compared two different sample types 
using either t-tests or a Wilcoxon rank sum, and chose the method that 
gave the most 'differentially expressed' genes at the lowest false 
discovery rate. I don't think you would have to argue very strenuously 
that this doesn't really prove one method is better than another.

We did this analysis because my colleagues argue against using the 
Affymetrix spike-in data to assess a method because you can always 
'tune' a method to work best with the spike-in data, without having any 
proof that it works well at all with 'real' data.

The only way I know to objectively test the different methods would be 
to take some samples, randomly select many (where many == thousands) 
genes to test using an agreed upon 'gold standard' (qRT-PCR, most 
likely), then analyze the samples using Affy chips and see which method 
correlates best with the gold standard result. Probably only take 
US$10,000 or so to do.

In the interim the only recourse as I see it is to pick a favorite 
method (based on something suitably intangible) and stick with it ;-D.



> Either 
> 1/ I am missing something which I would'nt be surprised of, as my expertise 
> is very limited.
> In that case I am sorry for pointing out something irrelevant and thank you 
> in advance for telling
> me what I'm missing,
> 2/ The differences in the normalization methods are really at the origin of 
> the observed differences.
> In that case, how can I know which method is the best for my case study? 
> Does a helpful paper exists 
> which explains in simple words the strengths/weaknesses of each method?
> Thank you very much in advance for your help,
> Emmanuel
> -------------------------------------- CODE 
> --------------------------------------
> library(affy)
> library(limma)
> # Load data into Affybatch
> data = ReadAffy(widget=T)
> # Background correction / normalization
> eset.rma = rma(data)
> eset.mas = mas5(data)
> # Get Expression values
> exp.rma = exprs(eset.rma)
> exp.mas = exprs(eset.mas)
> # --- Look for differentially expressed genes using Limma package
> strain = c("WT","WT","WT","Drug","Drug","Drug")
> design = model.matrix(~factor(strain))
> colnames(design) = c("WT","Drug")
> fit.rma = lmFit(eset.rma,design)
> fit.mas = lmFit(eset.mas,design)
> fit.rma.2 = eBayes(fit.rma)
> fit.mas.2 = eBayes(fit.mas)
> top.rma = as.numeric(rownames(topTable(fit.rma.2,n=1000)))
> top.mas = as.numeric(rownames(topTable(fit.mas.2,n=100)))
> length(intersect(top.rma,top.mas))
>>[1] 24
> top.rma = as.numeric(rownames(topTable(fit.rma.2,n=100)))
> top.mas = as.numeric(rownames(topTable(fit.mas.2,n=1000)))
> length(intersect(top.rma,top.mas))
>>[1] 0
> 	[[alternative HTML version deleted]]
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor

James W. MacDonald
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109

More information about the Bioconductor mailing list