[BioC] limma - Considering Batch Effect

Nathan S. Haigh n.haigh at sheffield.ac.uk
Wed Aug 8 09:05:56 CEST 2007


James MacDonald wrote:
> Hi Nathan,
>
>
> Nathan Haigh wrote:
>> I know this topic has arisen a few times from searching the mail
>> archive, but I'm still not clear on what/how to do this.
>>
>> Here is my experimental setup:
>> Array    Param1   Param2   Rep   Batch
>> 1        H        4        1     1
>> 2        H        4        2     2
>> 3        H        4        3     2
>> 4        H        20       1     1
>> 5        H        20       2     2
>> 6        H        20       3     2
>> 7        H        37       1     1
>> 8        H        37       2     2
>> 9        H        37       3     2
>> 10       L        4        1     1
>> 11       L        4        2     2
>> 12       L        4        3     2
>> 13       L        20       1     1
>> 14       L        20       2     2
>> 15       L        20       3     2
>> 16       L        37       1     1
>> 17       L        37       2     2
>> 18       L        37       3     2
>>
>> All rep 1's were processed in batch 1 and reps 2 and 3 were processes in
>> batch 2. From using GeneSpring, and conducting a cluster analysis on
>> experimental parameters, there is a clear batch effect. In GeneSpring, I
>> normalised the arrays, such that gene1 from array1 was divided by the
>> median of gene1 over arrays from batch1. Likewise, gene1 from array2 was
>> divided by the median of gene1 across arrays in batch2. These
>> successfully removed the batch effect. I'd like to be able to remove the
>> batch effect doing something similar, or accounting for it in limma.
>>
>> In essence, I want to remove the variability in the reps caused by the
>> batch effect.
>>
>> Could anyone explain how this can/should be done?
>
> Sure. Just add the Batch effect to your design matrix. For instance,
> if you want to model the differences in Param1, you would do something
> like:
>
> > design <- model.matrix(~0+factor(rep(1:2,
> each=9))+factor(rep(rep(1:2,1:2),6)))
> > colnames(design) <- c("H","L","Batch")
> > design
>    H L Batch
> 1  1 0     0
> 2  1 0     1
> 3  1 0     1
> 4  1 0     0
> 5  1 0     1
> 6  1 0     1
> 7  1 0     0
> 8  1 0     1
> 9  1 0     1
> 10 0 1     0
> 11 0 1     1
> 12 0 1     1
> 13 0 1     0
> 14 0 1     1
> 15 0 1     1
> 16 0 1     0
> 17 0 1     1
> 18 0 1     1
> attr(,"assign")
> [1] 1 1 2
> attr(,"contrasts")
> attr(,"contrasts")$`factor(rep(1:2, each = 9))`
> [1] "contr.treatment"
>
> attr(,"contrasts")$`factor(rep(rep(1:2, 1:2), 6))`
> [1] "contr.treatment"
>
> Then you can do the contrast between H and L.
>
> Best,
>
> Jim
>
>
>

Hi Jim,

Thanks very much for your help! I'm still trying to get my head around
the creation of the design matrix - it doesn't seem too intuitive to a
newbie. Although I'm sure all will become clear as soon as the penny drops!

Just to clarify something about the experimental design and hopefully
help me understand the construction of the design matrix. Param2 is a
temperature treatment, while Param1 is a "sample type". So it makes
sense to do contrasts between H.20 and H.4 to get cold-shock genes in H
samples, H.20 and H.37 to get heat-shock genes in H samples. How would I
set up the matrix for this? Also, in the context of this experimental
design, what does a contrast between H and L actually show?

Cheers
Nathan



More information about the Bioconductor mailing list