[BioC] ANOVA, SAM and Limma
naomi at stat.psu.edu
Fri Jun 25 19:42:14 CEST 2004
1. I did one simulation for each of 6 conditions which were 3 levels of
differential expression and 2 error distributions. That is why I say this
2. Limma is gene-by-gene ANOVA with an adjusted denominator. Ordinary
ANOVA had a higher false positive and false negative rate (as determined
from the simulation) than limma or SAM even after using the FDR adjustment.
3. The ordinary ANOVA was poor. Limma and SAM "use all the genes" in the
shrinkage estimate. They were more powerful in my small study than
ordinary ANOVA, but they missed most of the differentially expressing genes.
4. I am not sure I understand your comment about q-values. The estimate of
pi_0 was pretty good in all cases, including using the p-values from the
ANOVA F-test. I then selected q<.01 and looked that the false positive and
false negative rate for genes with q<.01. When SAM came up with a smaller
list of genes than limma, I compared the q-values and found that SAM with
q<.01 was comparable Limma with a smaller value of q. I then looked at the
number of false positives and false negatives.
Lastly, I hope that I was clear that I was analyzing a completely
randomized one-way design. I used the default settings for one-way ANOVA
in all of the software. For limma, this means that I use the Helmert
contrasts to obtain the ordinary and eBayes ANOVA F-tests.
At 01:17 PM 6/25/2004 -0400, Baker, Stephen wrote:
>I'm a little confused by your posting. Let me quote parts of your email
>and then ask for clarification:
> >... I did not replicate my simulations,...
> Does this mean you had only one one simulation?
> >1. Gene-by-gene ANOVA is not as good as limma and SAM.
> What is meant by "good"?
> I thought what you did in limma was gene-by-gene ANOVA?
> >3. 2 replicates does not give you a whole lot of power, even when you
> >"borrow strength" by using all the genes. Most of the differentially
> >expressing genes were not "discovered".
> Is this meant for all methods?
> >SAM's q-value estimate is more conservative,
> >but both are somewhat conservative. Most of the differences in results
> >appear to be differences in the estimated q-values, which were computed
> >from the p-values in limma and directly from the permutations in SAM.
> Aren't q-values a form of FDR and hence a function of the
>prevalence of true results?
> Aren't the p-values from limma from ANOVA which are "uniformly
>most powerful" if assumptions hold? Since q-values are based on p-values
>your result would be consistent with theory.
>One thing I find confusing is when a program/package name is cited
>instead of the specific statistical method applied. This may seem a
>minor point but it is insufficient when programs or packages have
>multiple options that could be used to do the same analysis.
>-.- -.. .---- .--. ..-.
>Stephen P. Baker, MScPH, PhD (ABD) (508) 856-2625
>Sr. Biostatistician- IS Bioinformatics Unit
>Lecturer in Biostatistics (775) 254-4885 fax
>Graduate School of Biomedical Sciences
>University of Massachusetts Medical School, Worcester
>55 Lake Avenue North stephen.baker at umassmed.edu
>Worcester, MA 01655 USA
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
Naomi S. Altman 814-865-3791 (voice)
Bioinformatics Consulting Center
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348 (Statistics)
University Park, PA 16802-2111
More information about the Bioconductor