[BioC] Contrasts for 3x2 factorial experiment in R/edgeR

Gordon K Smyth smyth at wehi.EDU.AU
Tue Jan 3 00:16:40 CET 2012


Dear Ben,

glmLRT() allows you to specify the contrast to be tested in two ways.  If 
the contrast happens to correspond to a column of your design matrix, then 
it is one the original coefficients and you can just specify which one 
using the coef= argument.  Otherwise you can use the contrast= argument to 
specify any contrast, using a numerical vector of the same length as the 
number of coefficients.

Suppose you used the default treatment parametrization:

   contrasts(Strain) <- contr.treat(3)
   contrasts(Treatment) <- contr.treat(2)

and fitted a model with Treatment nested within Strain:

   design <- model.matrix(~Strain+Strain:Treatment)

Then the coefficients 4:6 would be the logFC after stimulation in strains 
A to C respectively.  Hence you could test for differential expression 
after stimulation in strain B by:

   glmLRT(dge,fit,coef=5)

and so on.  You could equally well specify the same test by

   glmLRT(dge,fit,contrast=c(0,0,0,0,1,0,0))

To test for genes responding differently to simulation in strain B 
compared to strain A, you would use

   contrast=c(0,0,0,-1,1,0)

And so on.

Figuring out the appropriate contrast vectors using the sum-to-zero 
parametrization specified by contr.sum() is more tedious, and I will leave 
that to you.

Best wishes
Gordon

> Date: Tue, 20 Dec 2011 16:44:42 -0500
> From: Benjamin King <bking at mdibl.org>
> To: bioconductor at r-project.org
> Subject: [BioC] Contrasts for 3x2 factorial experiment in R/edgeR
>
> Hello,
>
> I'm using R/edgeR to analyze a 3x2 factorial RNA-Seq experiment and I would very much appreciate your help in specifying some contrasts.
>
> The design of my experiment has two factors, strain and treatment.  There are three strains (A, B and C) and two treatments (Unstimulated and Stimulated).  I have four biological replicates (except for one sample group where there are only three).
>
> # Group   Strain  Treatment
> # A.Un     A           Un
> # B.Un     B           Un
> # C.Un    C           Un
> # A.Stim  A           Stim
> # B.Stim  B           Stim
> # C.Stim  C          Stim
>
> Using "model.matrix(~Strain*Treatment)" I believe the model coefficients using sum to zero parameterization are as follows.
>
> Intercept:  (A.Un + B.Un + C.Un + A.Stim + B.Stim + C.Stim)/6
> Strain1:  (-A.Un + B.Un - A.Stim + B.Stim)/4
> Strain2:  (-A.Un + C.Un - A.Stim + C.Stim)/4
> Treatment1:  (-A.Un - B.Un - C.Un + A.Stim + B.Stim + C.Stim)/6
> Strain1:Treatment1:  (A.Un - B.Un - A.Stim + B.Stim)/4
> Strain2:Treatment2:  (A.Un - C.Un - A.Stim + C.Stim)/4
>
> I'm interested in these questions:
>
> 1. Which genes are differentially expressed in Strain B compared to Strain A?
> 2. Which genes are differentially expressed in Strain C compared to Strain A?
> 3. Which genes respond to Stimulated treatment overall?
> 4. Which genes respond to Stimulated treatment in Strain B?
> 5. Which genes respond to Stimulated treatment in Strain C?
> 6. Which genes respond to Stimulated treatment differently in Strain B compared to Strain A?
> 7. Which genes respond to Stimulated treatment differently in Strain B compared to Strain A?
>
> I believe questions 1, 2, 3, 6 and 7 are model coefficients.  If so, how can these contrasts be calculated using the "coef" parameter in the glmLRT function?
>
> For questions 4 and 5, what is the correct syntax to define the contrast matrix using sum to zero parameterization that I would then pass to the glmLRT function?
>
> Below is my current script and session information.
>
> Thank you in advance for your help,
>
> - Ben
>
>
>
> library(edgeR)
>
> # 3x2 Factorial Design
> # Strain  Treatment
> # A       Un
> # B       Un
> # C       Un
> # A       Stim
> # B       Stim
> # C       Stim
>
> ## Specify factors
> Strain <- factor(c("A","A","A","B","B","B","B","C","C","C","C","A","A","A","A","B","B","B","B","C","C","C","C"))
> Treatment <- factor(c("Un","Un","Un","Un","Un","Un","Un","Un","Un","Un","Un","Stim","Stim","Stim","Stim","Stim","Stim","Stim","Stim","Stim","Stim","Stim","Stim"))
>
> ## Read in count data
> raw.data <- read.table("group_counts.txt",sep="\t",header=T)
> d <- raw.data[, 2:24]
> rownames(d) <- raw.data[, 1]
>
> # make DGE object
> dge <- DGEList(counts = d, group = c("A.Un","A.Un","A.Un","B.Un","B.Un","B.Un","B.Un","C.Un","C.Un","C.Un","C.Un","A.Stim","A.Stim","A.Stim","A.Stim","B.Stim","B.Stim","B.Stim","B.Stim","C.Stim","C.Stim","C.Stim","C.Stim"))
> dge <- calcNormFactors(dge)
>
> # filter uninformative genes
> m <- 1e6 * t(t(dge$counts) / dge$samples$lib.size)
> ridx <- rowSums(m > 1) >= 2
> dge <- dge[ridx,]
>
> ## Design matrix
> # treatment-contrast parameterization
> contrasts(Strain) <- contr.sum(3)
> contrasts(Treatment) <- contr.sum(2)
> design <- model.matrix(~Strain*Treatment)
>
> ## Estimate Dispersion
> dge <- estimateGLMCommonDisp(dge, design)
>
> ## Fit model with Common Dispersion
> fit <- glmFit(dge, design, dispersion=dge$common.dispersion)
>
>
>
>
>> sessionInfo()
> R version 2.13.1 (2011-07-08)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] splines   stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] edgeR_2.2.5
>
> loaded via a namespace (and not attached):
> [1] limma_3.8.3  tools_2.13.1

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list