[BioC] ANOVA, SAM and Limma

Fri Jun 25 19:42:14 CEST 2004

Dear Stephen,

1. I did one simulation for each of 6 conditions which were 3 levels of 
differential expression and 2 error distributions.  That is why I say this 
is "heuristic".

2. Limma is gene-by-gene ANOVA with an adjusted denominator.  Ordinary 
ANOVA had a higher false positive and false negative rate (as determined 
from the simulation) than limma or SAM even after using the FDR adjustment.

3. The ordinary ANOVA was poor.  Limma and SAM "use all the genes" in the 
shrinkage estimate.  They were more powerful in my small study than 
ordinary ANOVA, but they missed most of the differentially expressing genes.

4. I am not sure I understand your comment about q-values.  The estimate of 
pi_0 was pretty good in all cases, including using the p-values from the 
ANOVA F-test.  I then selected q<.01 and looked that the false positive and 
false negative rate for genes with q<.01.  When SAM came up with a smaller 
list of genes than limma, I compared the q-values and found that SAM with 
q<.01 was comparable Limma with a smaller value of q.  I then looked at the 
number of false positives and false negatives.

Lastly, I hope that I was clear that I was analyzing a completely 
randomized one-way design.  I used the default settings for one-way ANOVA 
in all of the software.  For limma, this means that I use the Helmert 
contrasts to obtain the ordinary and eBayes ANOVA F-tests.

--Naomi

At 01:17 PM 6/25/2004 -0400, Baker, Stephen wrote:
>Naomi,
>I'm a little confused by your posting. Let me quote parts of your email
>and then ask for clarification:
>
>
> >...  I did not replicate my simulations,...
>
>             Does this mean you had only one one simulation?
>
>
> >1. Gene-by-gene ANOVA is not as good as limma and SAM.
>
>             What is meant by "good"?
>
>             I thought what you did in limma was gene-by-gene ANOVA?
>
>
> >3. 2 replicates does not give you a whole lot of power, even when you
> >"borrow strength" by using all the genes. Most of the differentially
> >expressing genes were not "discovered".
>
>             Is this meant for all methods?
>
> >SAM's q-value estimate is more conservative,
> >but both are somewhat conservative.  Most of the differences in results
>
> >appear to be differences in the estimated q-values, which were computed
>
> >from the p-values in limma and directly from the permutations in SAM.
>
>           Aren't q-values a form of FDR and hence a function of the
>prevalence of true results?
>
>           Aren't the p-values from limma from ANOVA which are "uniformly
>most powerful" if assumptions hold? Since q-values are based on p-values
>your result would be consistent with theory.
>
>
>One thing I find confusing is when a program/package name is cited
>instead of the specific statistical method applied. This may seem a
>minor point but it is insufficient when programs or packages have
>multiple options that could be used to do the same analysis.
>
>-.- -.. .---- .--. ..-.
>Stephen P. Baker, MScPH, PhD (ABD)         (508) 856-2625
>Sr. Biostatistician- IS Bioinformatics Unit
>Lecturer in Biostatistics                  (775) 254-4885 fax
>Graduate School of Biomedical Sciences
>University of Massachusetts Medical School, Worcester
>55 Lake Avenue North stephen.baker at umassmed.edu
>Worcester, MA 01655 USA
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Bioinformatics Consulting Center
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111