[BioC] technical reps, limma - theory
Gordon K Smyth
smyth at wehi.EDU.AU
Thu Jul 21 00:44:47 CEST 2005
The limma duplicateCorrelation() function fits the random effects model as a general mixed model
by maximising the REML likelihood. This is the same as lme() and lmer() etc. This is equivalent
to the ANOVA approach in the balanced case, although not otherwise.
Once you get to lmFit(), a contraint has been placed on the random effects model (common
within-block correlation across genes) to reduce it to a single error component model at the
genewise level. So by this time limma is already fitting a different model to the usual genewise
random effects approach. The residual degrees of freedom returned by lmFit() are correct for this
single error component approach.
I am sorry that the theory behind the limma approach to technical reps has not yet been written up
and published. I am trying to find time to do this. The theory is closely analogous to the
within-array duplicate spot theory which is published in Bioinformatics.
> Date: Tue, 19 Jul 2005 13:13:50 -0400
> From: Naomi Altman <naomi at stat.psu.edu>
> Subject: [BioC] technical reps, limma - theory
> To: bioconductor at stat.math.ethz.ch
> Consider the simple one-way design, with biological and technical
> reps. Generally we would consider the biological reps to be blocks. (For
> simplicity, we can think of this as a one-channel analysis). The usual
> ANOVA seems to be at odds with what we get from limma (ignoring the eBayes
> step, which clearly cannot be recovered from the classical treatment).
> The usual ANOVA (t treatments, b biological reps, n technical reps within
> biological reps, giving ntb arrays)
> source df MS F
> treatment t-1 MS(T) MS(T)/MS(B)
> bio rep b-1 MS(B) MS(B)/MSE (although we don't
> really care about this)
> error=T*B ntb-t-b+1 MSE
> However, the Limma manual suggests using duplicateCorrelation and block to
> handle block designs, and this gives a different ANOVA. In particular, the
> error d.f. for this ANOVA is ntb-t. Using this method, the within block
> correlation is used in computing the t-statistics for the treatment, so you
> do not get the simple 1-way ANOVA that would come from ignoring block, but
> you cannot recover the p-value from the usual ANOVA, either.
> If you put in bio rep as a fixed factor, then Limma will use the MSE as the
> denominator for the contrast tests, so this also does not recover the ANOVA.
> I have not tried this computation with replicate spots (only replicate
> arrays) but either:
> 1) the usual ANOVA is right and duplicateCorrelation is doing something odd
> 2) duplicate Correlation is right and I don't understand the usual ANOVA
> 3) both methods are correct for somewhat different models, and I don't
> understand the statistical implications of this
> I am not discounting 3 - as the statisticians in the crowd know, there are
> 2 versions, constrained and unconstrained, for the simplest case of
> balanced ANOVA with fixed and random effects which lead to different
> p-values for some effects and both are defensible for almost any data set.
> Anyways, I would like to understand this better. So, I would welcome comments.
> Naomi S. Altman 814-865-3791 (voice)
> Associate Professor
> Bioinformatics Consulting Center
> Dept. of Statistics 814-863-7114 (fax)
> Penn State University 814-865-1348 (Statistics)
> University Park, PA 16802-2111
More information about the Bioconductor