[BioC] Almost inexisting overlap of diff. expr. genes found when comparing mas5 / rma

Mon Sep 5 04:07:45 CEST 2005

The problem is considerable.

We found the same thing when we followed RMA exactly until the median 
polish step, and substituted Huber's biweight for median polish.  This 
produces a tiny difference in the expression values, and the same 40-50% 
overlap in the list.

Such are the limitations of the methodology at this point.

--Naomi

At 04:09 AM 7/9/2005, Adaikalavan Ramasamy wrote:
>Yes we often see poor overlaps. A 40 - 50 % overlap is considered
>pretty good but rare unless you are considering the top 5 genes
>in both list or something silly like that.
>
>To make a fair comparison, try comparing the lists when they are
>both filtered by the same p-value cutoff or statistics rather than
>arbitrarily choosing a numbers.
>
>
>Further, two minor cosmetic points about your code
>
>1) If you look at your design matrix from
>
>  strain = c("WT","WT","WT","Drug","Drug","Drug")
>  design = model.matrix(~factor(strain))
>  colnames(design) = c("WT","Drug")
>  design
>   WT Drug
>1  1    1
>2  1    1
>3  1    1
>4  1    0
>5  1    0
>6  1    0
>
>the first column represents an intercept not WT. To get the
>correct interpretation, you need to change the second line to
>
>  design = model.matrix(~ -1 + factor(strain) )
>
>
>2) You do not need the force the rownames to numeric using
>as.numeric() since intersect happily works with characters.
>
>  x <- c("a", "b", "c")
>  y <- c("b", "c", "d")
>  intersect(x,y)
>[1] "b" "c"
>
>But I do not think either of these point change your results.
>
>
>
>
>On Fri, 2005-07-08 at 18:18 +0100, Emmanuel Levy wrote:
> > Dear Bioconductor community,
> >
> > I've been looking for differentially expressed genes in C. elegans after a
> > drug treatment.
> > There are 3 replicates of each condition and 2 conditions in total (WT and
> > Drug)
> > I used limma combined with either rma or mas5. I find a very very poor
> > overlap in the results:
> >
> > - example (i) only 24 of the 100 most differentially expressed genes
> > obtained using rma are found in
> > the 1000 most differentially expressed genes obtained using mas5
> > - example (ii) only 183 genes are common to the lists of the 1000 most
> > differentially expressed genes
> > found using both methods.
> > (see piece of code at the end)
> >
> > Either
> > 1/ I am missing something which I would'nt be surprised of, as my 
> expertise
> > is very limited.
> >
> > In that case I am sorry for pointing out something irrelevant and thank 
> you
> > in advance for telling
> > me what I'm missing,
> >
> > 2/ The differences in the normalization methods are really at the 
> origin of
> > the observed differences.
> > In that case, how can I know which method is the best for my case study?
> > Does a helpful paper exists
> > which explains in simple words the strengths/weaknesses of each method?
> >
> > Thank you very much in advance for your help,
> >
> > Emmanuel
> >
> > -------------------------------------- CODE
> > --------------------------------------
> > library(affy)
> > library(limma)
> >
> > # Load data into Affybatch
> > data = ReadAffy(widget=T)
> >
> > # Background correction / normalization
> > eset.rma = rma(data)
> > eset.mas = mas5(data)
> >
> > # Get Expression values
> > exp.rma = exprs(eset.rma)
> > exp.mas = exprs(eset.mas)
> >
> > # --- Look for differentially expressed genes using Limma package
> > strain = c("WT","WT","WT","Drug","Drug","Drug")
> > design = model.matrix(~factor(strain))
> > colnames(design) = c("WT","Drug")
> >
> > fit.rma = lmFit(eset.rma,design)
> > fit.mas = lmFit(eset.mas,design)
> >
> > fit.rma.2 = eBayes(fit.rma)
> > fit.mas.2 = eBayes(fit.mas)
> >
> > top.rma = as.numeric(rownames(topTable(fit.rma.2,n=1000)))
> > top.mas = as.numeric(rownames(topTable(fit.mas.2,n=100)))
> > length(intersect(top.rma,top.mas))
> > > [1] 24
> >
> > top.rma = as.numeric(rownames(topTable(fit.rma.2,n=100)))
> > top.mas = as.numeric(rownames(topTable(fit.mas.2,n=1000)))
> > length(intersect(top.rma,top.mas))
> > > [1] 0
> >
> >       [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> >
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111