[BioC] unbalanced design

Gordon Smyth smyth at wehi.edu.au
Thu Dec 9 00:23:44 CET 2004

>Date: Mon, 6 Dec 2004 18:11:31 +0100
>From: <Arne.Muller at aventis.com>
>Subject: [BioC] unbalanced design
>To: <bioconductor at stat.math.ethz.ch>
>Dear All,
>I'm wondering which method to use to analyse a rather unbalanced design 
>(Affy). I've three studies (S1..S3) in which several doses of a drug have 
>been tested. The table blow gives the number of observations (the 
>replicates, cell cultures) per study/dose combination:
>                 S1      S2      S3
>0.00mM  4       3       3
>0.01mM  0       0       3
>0.10mM  3       3       0
>0.25mM  2       3       3
>0.50mM  3       3       0
>1.00mM  3       3       3
>For the moment the entire experiment is normalized all together 
>I expect a very strong study effect (different laboratory protocols have 
>been used!), and the dose effect should be mainly independent (small 
>interaction). I think lme would be the right choice, since the study 
>effect is an (unwanted) random effect, but I'm not sure using it for an 
>unbalanced design. I'd use:
>value ~ dose, random = ~ 1|study

I'd it as in Section 10.3 of the limma User's Guide. That is equivalent to 
the randomized block model you indicate here, but with 'hard' smoothing of 
the block correlations between genes.

>In addition I'd like to test the interaction (I don't expect much 
>interaction),  with a fixed effects model:
>value ~ study + dose + study:dose
>Again, I'm not sure whether this model would be meaningful because of the 
>unbalanced design (especially with repsect to testing interactions).
>I'm interested whether there's a general dose effect, i.e. genes 
>consistently altered across  studies.
>Would limma handle such a design, or are there other packages that could 
>be used for this (e.g. testing for a trend or dose dependence across studies)?

Any software that can fit linear models can do it, depending on what 
moderation or shrinkage method you want to apply to the tests. The lack of 
balance is not particular a problem except that only 6 out of 10 possible 
degrees of freedom for interaction are estimable. The limma method would be:

design <- model.matrix(~ study + dose + study:dose)
fit <- lmFit(eset, design)

limma will report that 4 coefficients are non-estimable, and will tell you 
which ones these are. Use a contrasts matrix to pick out the 6 interaction 
coefficients that are estimable, run eBayes() on the reduced fit, then the 
fit will contain moderated F-statistics and corresponding p-values for 
testing interaction.


>I'd be happy for any discussion and comments.
>         kind regards,
>         Arne

More information about the Bioconductor mailing list