[BioC] Contrasts for 3x2 factorial experiment in R/edgeR

Gordon K Smyth smyth at wehi.EDU.AU
Wed Jan 4 00:23:34 CET 2012


Dear Ben,

Your design matrix is still using the sum-to-zero parametrization, which I 
find a bit unhelpful in the genomic context.  As I said in my previous 
reply, I will leave interpretations of that parametrization to you.

Best wishes
Gordon

On Tue, 3 Jan 2012, Benjamin King wrote:

> Dear Gordon,
>
> I very much appreciate your reply and I will follow your suggestion of 
> not using the sum-to-zero parameterization.
>
> I do have one additional question about interpreting the model 
> coefficients.  If my design matrix is as shown below and the model is 
> "~Strain+Strain:Treatment" is the correct interpretation of the 
> coefficients?
>
> Coefficient                Interpretation
> Intercept
> Strain1                       "Strain B compared to Strain A (i.e., Which genes are differentially expressed in Strain B compared to Strain A?)"
> Strain2                       "Strain C compared to Strain A  (i.e., Which genes are differentially expressed in Strain C compared to Strain A?)"
> Treatment1                "Overall treatment effect (i.e., Which genes respond to Stimulated treatment overall?)"
> Strain1:Treatment1  "Interaction between StrainB and Treatment (i.e., Which genes respond to Stimulated treatment differently in Strain B compared to Strain A?)"
> Strain2:Treatment2  "Interaction between StrainC and Treatment (i.e., Which genes respond to Stimulated treatment differently in Strain C compared to Strain A?)"
>
>   (Intercept) Strain1 Strain2 Treatment1 Strain1:Treatment1 Strain2:Treatment1
> A.Un         1      -1      -1         -1                  1                  1
> A.Un         1      -1      -1         -1                  1                  1
> A.Un         1      -1      -1         -1                  1                  1
> B.Un         1       1       0         -1                 -1                  0
> B.Un         1       1       0         -1                 -1                  0
> B.Un         1       1       0         -1                 -1                  0
> B.Un         1       1       0         -1                 -1                  0
> C.Un         1       0       0         -1                  0                 -1
> C.Un         1       0       0         -1                  0                 -1
> C.Un         1       0       0         -1                  0                 -1
> C.Un         1       0       0         -1                  0                 -1
> A.Stim       1      -1      -1          1                 -1                 -1
> A.Stim       1      -1      -1          1                 -1                 -1
> A.Stim       1      -1      -1          1                 -1                 -1
> A.Stim       1      -1      -1          1                 -1                 -1
> B.Stim       1       1       0          1                  1                  0
> B.Stim       1       1       0          1                  1                  0
> B.Stim       1       1       0          1                  1                  0
> B.Stim       1       1       0          1                  1                  0
> C.Stim       1       0       1          1                  0                  1
> C.Stim       1       0       1          1                  0                  1
> C.Stim       1       0       1          1                  0                  1
> C.Stim       1       0       1          1                  0                  1
>
> If this is correct, then is the interaction between strain B and treatment the fifth coefficient or c(0,0,0,0,1,0)?
>
> Likewise, is the interaction between strain C and treatment the sixth coefficient or c(0,0,0,0,0,1)?
>
> Thank you,
>
> - Ben
>
>
> On Jan 2, 2012, at 6:16 PM, Gordon K Smyth wrote:
>
>> Dear Ben,
>>
>> glmLRT() allows you to specify the contrast to be tested in two ways.  If the contrast happens to correspond to a column of your design matrix, then it is one the original coefficients and you can just specify which one using the coef= argument.  Otherwise you can use the contrast= argument to specify any contrast, using a numerical vector of the same length as the number of coefficients.
>>
>> Suppose you used the default treatment parametrization:
>>
>>  contrasts(Strain) <- contr.treat(3)
>>  contrasts(Treatment) <- contr.treat(2)
>>
>> and fitted a model with Treatment nested within Strain:
>>
>>  design <- model.matrix(~Strain+Strain:Treatment)
>>
>> Then the coefficients 4:6 would be the logFC after stimulation in strains A to C respectively.  Hence you could test for differential expression after stimulation in strain B by:
>>
>>  glmLRT(dge,fit,coef=5)
>>
>> and so on.  You could equally well specify the same test by
>>
>>  glmLRT(dge,fit,contrast=c(0,0,0,0,1,0,0))
>>
>> To test for genes responding differently to simulation in strain B compared to strain A, you would use
>>
>>  contrast=c(0,0,0,-1,1,0)
>>
>> And so on.
>>
>> Figuring out the appropriate contrast vectors using the sum-to-zero 
>> parametrization specified by contr.sum() is more tedious, and I will 
>> leave that to you.
>>
>> Best wishes
>> Gordon
>>
>>> Date: Tue, 20 Dec 2011 16:44:42 -0500
>>> From: Benjamin King <bking at mdibl.org>
>>> To: bioconductor at r-project.org
>>> Subject: [BioC] Contrasts for 3x2 factorial experiment in R/edgeR
>>>
>>> Hello,
>>>
>>> I'm using R/edgeR to analyze a 3x2 factorial RNA-Seq experiment and I would very much appreciate your help in specifying some contrasts.
>>>
>>> The design of my experiment has two factors, strain and treatment.  There are three strains (A, B and C) and two treatments (Unstimulated and Stimulated).  I have four biological replicates (except for one sample group where there are only three).
>>>
>>> # Group   Strain  Treatment
>>> # A.Un     A           Un
>>> # B.Un     B           Un
>>> # C.Un    C           Un
>>> # A.Stim  A           Stim
>>> # B.Stim  B           Stim
>>> # C.Stim  C          Stim
>>>
>>> Using "model.matrix(~Strain*Treatment)" I believe the model coefficients using sum to zero parameterization are as follows.
>>>
>>> Intercept:  (A.Un + B.Un + C.Un + A.Stim + B.Stim + C.Stim)/6
>>> Strain1:  (-A.Un + B.Un - A.Stim + B.Stim)/4
>>> Strain2:  (-A.Un + C.Un - A.Stim + C.Stim)/4
>>> Treatment1:  (-A.Un - B.Un - C.Un + A.Stim + B.Stim + C.Stim)/6
>>> Strain1:Treatment1:  (A.Un - B.Un - A.Stim + B.Stim)/4
>>> Strain2:Treatment2:  (A.Un - C.Un - A.Stim + C.Stim)/4
>>>
>>> I'm interested in these questions:
>>>
>>> 1. Which genes are differentially expressed in Strain B compared to Strain A?
>>> 2. Which genes are differentially expressed in Strain C compared to Strain A?
>>> 3. Which genes respond to Stimulated treatment overall?
>>> 4. Which genes respond to Stimulated treatment in Strain B?
>>> 5. Which genes respond to Stimulated treatment in Strain C?
>>> 6. Which genes respond to Stimulated treatment differently in Strain B compared to Strain A?
>>> 7. Which genes respond to Stimulated treatment differently in Strain B compared to Strain A?
>>>
>>> I believe questions 1, 2, 3, 6 and 7 are model coefficients.  If so, how can these contrasts be calculated using the "coef" parameter in the glmLRT function?
>>>
>>> For questions 4 and 5, what is the correct syntax to define the contrast matrix using sum to zero parameterization that I would then pass to the glmLRT function?
>>>
>>> Below is my current script and session information.
>>>
>>> Thank you in advance for your help,
>>>
>>> - Ben
>>>
>>>
>>>
>>> library(edgeR)
>>>
>>> # 3x2 Factorial Design
>>> # Strain  Treatment
>>> # A       Un
>>> # B       Un
>>> # C       Un
>>> # A       Stim
>>> # B       Stim
>>> # C       Stim
>>>
>>> ## Specify factors
>>> Strain <- factor(c("A","A","A","B","B","B","B","C","C","C","C","A","A","A","A","B","B","B","B","C","C","C","C"))
>>> Treatment <- factor(c("Un","Un","Un","Un","Un","Un","Un","Un","Un","Un","Un","Stim","Stim","Stim","Stim","Stim","Stim","Stim","Stim","Stim","Stim","Stim","Stim"))
>>>
>>> ## Read in count data
>>> raw.data <- read.table("group_counts.txt",sep="\t",header=T)
>>> d <- raw.data[, 2:24]
>>> rownames(d) <- raw.data[, 1]
>>>
>>> # make DGE object
>>> dge <- DGEList(counts = d, group = c("A.Un","A.Un","A.Un","B.Un","B.Un","B.Un","B.Un","C.Un","C.Un","C.Un","C.Un","A.Stim","A.Stim","A.Stim","A.Stim","B.Stim","B.Stim","B.Stim","B.Stim","C.Stim","C.Stim","C.Stim","C.Stim"))
>>> dge <- calcNormFactors(dge)
>>>
>>> # filter uninformative genes
>>> m <- 1e6 * t(t(dge$counts) / dge$samples$lib.size)
>>> ridx <- rowSums(m > 1) >= 2
>>> dge <- dge[ridx,]
>>>
>>> ## Design matrix
>>> # treatment-contrast parameterization
>>> contrasts(Strain) <- contr.sum(3)
>>> contrasts(Treatment) <- contr.sum(2)
>>> design <- model.matrix(~Strain*Treatment)
>>>
>>> ## Estimate Dispersion
>>> dge <- estimateGLMCommonDisp(dge, design)
>>>
>>> ## Fit model with Common Dispersion
>>> fit <- glmFit(dge, design, dispersion=dge$common.dispersion)
>>>
>>>
>>>
>>>
>>>> sessionInfo()
>>> R version 2.13.1 (2011-07-08)
>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>>
>>> locale:
>>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>>
>>> attached base packages:
>>> [1] splines   stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] edgeR_2.2.5
>>>
>>> loaded via a namespace (and not attached):
>>> [1] limma_3.8.3  tools_2.13.1
>>
>
>
>
>
>
>
>
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list