[BioC] Almost inexisting overlap of diff. expr. genes found when comparing mas5 / rma
naomi at stat.psu.edu
Mon Sep 5 04:07:45 CEST 2005
The problem is considerable.
We found the same thing when we followed RMA exactly until the median
polish step, and substituted Huber's biweight for median polish. This
produces a tiny difference in the expression values, and the same 40-50%
overlap in the list.
Such are the limitations of the methodology at this point.
At 04:09 AM 7/9/2005, Adaikalavan Ramasamy wrote:
>Yes we often see poor overlaps. A 40 - 50 % overlap is considered
>pretty good but rare unless you are considering the top 5 genes
>in both list or something silly like that.
>To make a fair comparison, try comparing the lists when they are
>both filtered by the same p-value cutoff or statistics rather than
>arbitrarily choosing a numbers.
>Further, two minor cosmetic points about your code
>1) If you look at your design matrix from
> strain = c("WT","WT","WT","Drug","Drug","Drug")
> design = model.matrix(~factor(strain))
> colnames(design) = c("WT","Drug")
> WT Drug
>1 1 1
>2 1 1
>3 1 1
>4 1 0
>5 1 0
>6 1 0
>the first column represents an intercept not WT. To get the
>correct interpretation, you need to change the second line to
> design = model.matrix(~ -1 + factor(strain) )
>2) You do not need the force the rownames to numeric using
>as.numeric() since intersect happily works with characters.
> x <- c("a", "b", "c")
> y <- c("b", "c", "d")
> "b" "c"
>But I do not think either of these point change your results.
>On Fri, 2005-07-08 at 18:18 +0100, Emmanuel Levy wrote:
> > Dear Bioconductor community,
> > I've been looking for differentially expressed genes in C. elegans after a
> > drug treatment.
> > There are 3 replicates of each condition and 2 conditions in total (WT and
> > Drug)
> > I used limma combined with either rma or mas5. I find a very very poor
> > overlap in the results:
> > - example (i) only 24 of the 100 most differentially expressed genes
> > obtained using rma are found in
> > the 1000 most differentially expressed genes obtained using mas5
> > - example (ii) only 183 genes are common to the lists of the 1000 most
> > differentially expressed genes
> > found using both methods.
> > (see piece of code at the end)
> > Either
> > 1/ I am missing something which I would'nt be surprised of, as my
> > is very limited.
> > In that case I am sorry for pointing out something irrelevant and thank
> > in advance for telling
> > me what I'm missing,
> > 2/ The differences in the normalization methods are really at the
> origin of
> > the observed differences.
> > In that case, how can I know which method is the best for my case study?
> > Does a helpful paper exists
> > which explains in simple words the strengths/weaknesses of each method?
> > Thank you very much in advance for your help,
> > Emmanuel
> > -------------------------------------- CODE
> > --------------------------------------
> > library(affy)
> > library(limma)
> > # Load data into Affybatch
> > data = ReadAffy(widget=T)
> > # Background correction / normalization
> > eset.rma = rma(data)
> > eset.mas = mas5(data)
> > # Get Expression values
> > exp.rma = exprs(eset.rma)
> > exp.mas = exprs(eset.mas)
> > # --- Look for differentially expressed genes using Limma package
> > strain = c("WT","WT","WT","Drug","Drug","Drug")
> > design = model.matrix(~factor(strain))
> > colnames(design) = c("WT","Drug")
> > fit.rma = lmFit(eset.rma,design)
> > fit.mas = lmFit(eset.mas,design)
> > fit.rma.2 = eBayes(fit.rma)
> > fit.mas.2 = eBayes(fit.mas)
> > top.rma = as.numeric(rownames(topTable(fit.rma.2,n=1000)))
> > top.mas = as.numeric(rownames(topTable(fit.mas.2,n=100)))
> > length(intersect(top.rma,top.mas))
> > >  24
> > top.rma = as.numeric(rownames(topTable(fit.rma.2,n=100)))
> > top.mas = as.numeric(rownames(topTable(fit.mas.2,n=1000)))
> > length(intersect(top.rma,top.mas))
> > >  0
> > [[alternative HTML version deleted]]
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
Naomi S. Altman 814-865-3791 (voice)
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348 (Statistics)
University Park, PA 16802-2111
More information about the Bioconductor