[BioC] limma design question

Fri Dec 5 21:32:22 CET 2008

Hi Gordon,

I've been out for a while and finally read your detailed reply. 
Thanks so much - it really helps clarify things for me!!

Cheers,
Jenny

At 05:55 PM 11/27/2008, Gordon K Smyth wrote:
>Hi Jenny,
>
>Should blocks be fixed (in the design matrix) or treated as random 
>(hence enter the covariance matrix as correlations)?  This question 
>has a long history in mathematical statistics, so long that you can 
>be sure than the answer is somewhat subtle.
>
>Neither approach is right or wrong.  The random approach makes more 
>assumptions and allows you, in some circumstances, to extract more 
>information.  The limma approach with dupcor etc makes even more 
>assumptions than classical random effects models.  If the blocks are 
>treated as fixed, then treatments can only be compared within 
>blocks.  If blocks are treated as random, then it is possible to 
>compare treatments between blocks as well as within.
>
>So the first key issue is whether treatment comparisons are made 
>between blocks or within blocks.
>
>Suppose you do an experiment on random samples of subjects from two 
>groups, in which each subject is subjected to several tests.  The 
>subjects are blocks. The total sums of squares can be divided into 
>between and within subject sums of squares.  In other words, the 
>information in the data can be divided into a between-subject error 
>strata and a within-subject strata.
>
>Suppose you want to compare the two groups.  All the information is 
>in the between-subject error strata.  You cannot do any statistical 
>test unless you treat the subjects as random.
>
>Suppose now you want to compare the treatments.  If the experiment 
>is balanced (all subjects do all tests), then all the information 
>about the treatments is in the within-block strata.  So you may as 
>well treat the subjects as fixed effects (as for example is done in 
>a paired t-test).
>
>If the experiment is unbalanced (each subject does only a subset of 
>the tests, subjects do tests a different number of times), then you 
>can extract more information about the treatment comparisons from 
>the between-subject error strata.  To do this, you have to treat the 
>blocks as random.
>
>The second key issue to consider is whether it makes sense 
>scientifically to treat the blocks as random.  If there are only two 
>or three blocks, then there is little to be gained by treating them 
>as random.  If the blocks have large unpredictable effects, then it 
>is much safer to treat them as fixed.  If you want to make specific 
>conclusions about each of the blocks, then it doesn't make sense to 
>treat them as a random.  In general, random is natural if there are 
>lots of blocks with relatively small effects and not of interest in 
>themselves.  Sometimes you can go either way.
>
>Hope this helps
>Gordon
>
>On Tue, 25 Nov 2008, Jenny Drnevich wrote:
>
>>Hi Jim,
>>
>>I've seen you suggest this way for account for blocks by fitting 
>>extra columns in the design matrix before. I'm just wondering how 
>>this differs from the suggestion in the limma vignette (Section 8.2 
>>Technical Replication) to use duplicateCorrelation() to determine 
>>the average correlation between blocks. I know they are not 
>>mathematically equivalent; the coefficients for the treatment 
>>groups are slightly different, they use different DF, and the 
>>p-values tend to be larger using the duplicateCorrelation() method 
>>(at least for the one experiment I'm using). So, is one more 
>>"correct" than the other? Or are blocks of technical replicates 
>>different somehow than blocks of patients or cell lines, etc.?
>>
>>Thanks,
>>Jenny
>
>Jenny Drnevich, Ph.D.
>
>Functional Genomics Bioinformatics Specialist
>W.M. Keck Center for Comparative and Functional Genomics
>Roy J. Carver Biotechnology Center
>University of Illinois, Urbana-Champaign
>
>330 ERML
>1201 W. Gregory Dr.
>Urbana, IL 61801
>USA
>
>ph: 217-244-7355
>fax: 217-265-5066
>e-mail: drnevich at illinois.edu