[BioC] Experimental Design, Model Matrix and Contrasts

Mon Jul 21 00:55:03 CEST 2008

Hi Nathan,
How about adding the contamination factor to your original design, do  
the first lmFit, then then fit exactly the same contrasts as you have  
specified. The effect of contamination will be taken care of in the  
linear model, then you are left with 'cleaner' estimates of the C/T  
d60/67 parameters, which your contrasts can pick out.

cheers,
Mark

On 17/07/2008, at 9:59 AM, <Nathan.Watson-Haigh at csiro.au> <Nathan.Watson-Haigh at csiro.au 
 > wrote:

> I'm getting myself confused with setting up these things for a limma
> analysis. Please stop me and correct me if I'm wrong with any of this!
>
> I have two factors each with two levels:
> Treatment - "C" and "T"
> Gestation - "d60" and "d67"
>
> I setup up my model and design matrix as follows:
>
> TG<-paste(pData(eSet)$Treatment, pData(eSet)$Gestation, sep=".")
> TG<-factor(TG,levels=unique(TG))
> design<-model.matrix(~0+TG)
> colnames(design)<-levels(TG)
>
> which gave the following design matrix:
>
> C.d60 T.d60 C.d67 T.d67
> 1 1 0 0 0
> 2 1 0 0 0
> 3 1 0 0 0
> 4 1 0 0 0
> 5 0 1 0 0
> 6 0 1 0 0
> 7 0 1 0 0
> 8 0 1 0 0
> 9 0 1 0 0
> 10 0 0 1 0
> 11 0 0 1 0
> 12 0 0 1 0
> 13 0 0 1 0
> 14 0 0 0 1
> 15 0 0 0 1
> 16 0 0 0 1
> attr(,"assign")
> [1] 1 1 1 1
> attr(,"contrasts")
> attr(,"contrasts")$TG
> [1] "contr.treatment"
>
> So I can use contrasts to pull out comparisons of Control Day 60  
> (C.d60)
> versus Treatment Day 60 (T.d60), Control Day 67 (C.d67) versus  
> Treatment
> Day 67 (T.d67), the treatment main effect as well as the interaction
> terms using the following:
>
> cont.matrix<-makeContrasts(
> Treatment_d60 = T.d60 - C.d60,
> Treatment_d67 = T.d67 - C.d67,
> Treatment = (T.d60+T.d67) - (C.d60+C.d67),
> Diff = (T.d67-C.d67) - (T.d60-C.d60),
> levels=design
> )
>
> Which gives the following contrast matrix:
>
> Contrasts
> Levels Treatment_d60 Treatment_d67 Treatment Diff
> C.d60 -1 0 -1 1
> T.d60 1 0 1 -1
> C.d67 0 -1 -1 -1
> T.d67 0 1 1 1
>
> I've now realised that there is contamination in some of my samples  
> and
> would like to include this in the model so I can try to remove it's
> effect and try to increase the power to detect differentially  
> expressed
> genes for the contrasts show above. However I'm not clear on how to
> proceed. I don't think the above approach can be extended to include  
> the
> contamination effect. So I though something like this might work/be
> correct:
>
> Treatment<-factor(targets$Treatment, levels=unique(targets$Treatment))
> Gestation<-factor(targets$Gestation, levels=unique(targets$Gestation))
> Contamination<-factor(targets$Contamination,
> levels=unique(targets$Contamination))
> design<-model.matrix(~0+Treatment*Gestation+Contamination)
>
> Which give the following design matrix:
>   TreatmentC TreatmentT Gestationd67 Contamination1
> TreatmentT:Gestationd67
> 1           1          0            0              0
> 0
> 2           1          0            0              1
> 0
> 3           1          0            0              0
> 0
> 4           1          0            0              1
> 0
> 5           0          1            0              0
> 0
> 6           0          1            0              1
> 0
> 7           0          1            0              1
> 0
> 8           0          1            0              1
> 0
> 9           0          1            0              0
> 0
> 10          1          0            1              0
> 0
> 11          1          0            1              0
> 0
> 12          1          0            1              0
> 0
> 13          1          0            1              1
> 0
> 14          0          1            1              0
> 1
> 15          0          1            1              0
> 1
> 16          0          1            1              0
> 1
> attr(,"assign")
> [1] 1 1 2 3 4
> attr(,"contrasts")
> attr(,"contrasts")$Treatment
> [1] "contr.treatment"
>
> attr(,"contrasts")$Gestation
> [1] "contr.treatment"
>
> attr(,"contrasts")$Contamination
> [1] "contr.treatment"
>
>
> I now have a few questions:
> 1) When specifying the model, what's the difference between using
> ~Treatment*Gestation and ~0+Treatment*Gestation and how does the alter
> the interpretation of the design matrix?
> 2) Using the above design matrix (assuming it is correct/the best  
> way to
> go forward), how would I get the Treatment effect, Gestation effect,
> interaction, as well as the treatment effect for d60 Gestation and the
> treatment effect for d67?
> 3) Out of interest, how would I find out the genes affected by the
> contamination?
>
> I'm somewhat confused, so if I'm making any serious errors, please let
> me know!
> Nathan
>
> -------------------------------------------------------------
> Dr. Nathan S. Watson-Haigh        (publish under Haigh, N.S.)
> OCE Post Doctoral Fellow
> CSIRO Livestock Industries
> J M Rendel Laboratory
> Rockhampton
> QLD 4701                              Tel: +61 (0)7 4923 8121
> Australia                             Fax: +61 (0)7 4923 8222
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor