[BioC] limma design question

Fri Nov 28 00:55:11 CET 2008

Hi Jenny,

Should blocks be fixed (in the design matrix) or treated as random (hence 
enter the covariance matrix as correlations)?  This question has a long 
history in mathematical statistics, so long that you can be sure than the 
answer is somewhat subtle.

Neither approach is right or wrong.  The random approach makes more 
assumptions and allows you, in some circumstances, to extract more 
information.  The limma approach with dupcor etc makes even more 
assumptions than classical random effects models.  If the blocks are 
treated as fixed, then treatments can only be compared within blocks.  If 
blocks are treated as random, then it is possible to compare treatments 
between blocks as well as within.

So the first key issue is whether treatment comparisons are made between 
blocks or within blocks.

Suppose you do an experiment on random samples of subjects from two 
groups, in which each subject is subjected to several tests.  The subjects 
are blocks. The total sums of squares can be divided into between and 
within subject sums of squares.  In other words, the information in the 
data can be divided into a between-subject error strata and a 
within-subject strata.

Suppose you want to compare the two groups.  All the information is in the 
between-subject error strata.  You cannot do any statistical test unless 
you treat the subjects as random.

Suppose now you want to compare the treatments.  If the experiment is 
balanced (all subjects do all tests), then all the information about the 
treatments is in the within-block strata.  So you may as well treat the 
subjects as fixed effects (as for example is done in a paired t-test).

If the experiment is unbalanced (each subject does only a subset of the 
tests, subjects do tests a different number of times), then you can 
extract more information about the treatment comparisons from the 
between-subject error strata.  To do this, you have to treat the blocks as 
random.

The second key issue to consider is whether it makes sense scientifically 
to treat the blocks as random.  If there are only two or three blocks, 
then there is little to be gained by treating them as random.  If the 
blocks have large unpredictable effects, then it is much safer to treat 
them as fixed.  If you want to make specific conclusions about each of the 
blocks, then it doesn't make sense to treat them as a random.  In general, 
random is natural if there are lots of blocks with relatively small 
effects and not of interest in themselves.  Sometimes you can go either 
way.

Hope this helps
Gordon

On Tue, 25 Nov 2008, Jenny Drnevich wrote:

> Hi Jim,
>
> I've seen you suggest this way for account for blocks by fitting extra 
> columns in the design matrix before. I'm just wondering how this differs from 
> the suggestion in the limma vignette (Section 8.2 Technical Replication) to 
> use duplicateCorrelation() to determine the average correlation between 
> blocks. I know they are not mathematically equivalent; the coefficients for 
> the treatment groups are slightly different, they use different DF, and the 
> p-values tend to be larger using the duplicateCorrelation() method (at least 
> for the one experiment I'm using). So, is one more "correct" than the other? 
> Or are blocks of technical replicates different somehow than blocks of 
> patients or cell lines, etc.?
>
> Thanks,
> Jenny