[BioC] limma design question
Gordon K Smyth
smyth at wehi.EDU.AU
Fri Nov 28 00:55:11 CET 2008
Should blocks be fixed (in the design matrix) or treated as random (hence
enter the covariance matrix as correlations)? This question has a long
history in mathematical statistics, so long that you can be sure than the
answer is somewhat subtle.
Neither approach is right or wrong. The random approach makes more
assumptions and allows you, in some circumstances, to extract more
information. The limma approach with dupcor etc makes even more
assumptions than classical random effects models. If the blocks are
treated as fixed, then treatments can only be compared within blocks. If
blocks are treated as random, then it is possible to compare treatments
between blocks as well as within.
So the first key issue is whether treatment comparisons are made between
blocks or within blocks.
Suppose you do an experiment on random samples of subjects from two
groups, in which each subject is subjected to several tests. The subjects
are blocks. The total sums of squares can be divided into between and
within subject sums of squares. In other words, the information in the
data can be divided into a between-subject error strata and a
Suppose you want to compare the two groups. All the information is in the
between-subject error strata. You cannot do any statistical test unless
you treat the subjects as random.
Suppose now you want to compare the treatments. If the experiment is
balanced (all subjects do all tests), then all the information about the
treatments is in the within-block strata. So you may as well treat the
subjects as fixed effects (as for example is done in a paired t-test).
If the experiment is unbalanced (each subject does only a subset of the
tests, subjects do tests a different number of times), then you can
extract more information about the treatment comparisons from the
between-subject error strata. To do this, you have to treat the blocks as
The second key issue to consider is whether it makes sense scientifically
to treat the blocks as random. If there are only two or three blocks,
then there is little to be gained by treating them as random. If the
blocks have large unpredictable effects, then it is much safer to treat
them as fixed. If you want to make specific conclusions about each of the
blocks, then it doesn't make sense to treat them as a random. In general,
random is natural if there are lots of blocks with relatively small
effects and not of interest in themselves. Sometimes you can go either
Hope this helps
On Tue, 25 Nov 2008, Jenny Drnevich wrote:
> Hi Jim,
> I've seen you suggest this way for account for blocks by fitting extra
> columns in the design matrix before. I'm just wondering how this differs from
> the suggestion in the limma vignette (Section 8.2 Technical Replication) to
> use duplicateCorrelation() to determine the average correlation between
> blocks. I know they are not mathematically equivalent; the coefficients for
> the treatment groups are slightly different, they use different DF, and the
> p-values tend to be larger using the duplicateCorrelation() method (at least
> for the one experiment I'm using). So, is one more "correct" than the other?
> Or are blocks of technical replicates different somehow than blocks of
> patients or cell lines, etc.?
More information about the Bioconductor