[BioC] DESeq2 multiple factors nested design

Michael Love michaelisaiahlove at gmail.com
Fri Mar 28 01:10:33 CET 2014

```hi Isobel,

On Thu, Mar 27, 2014 at 1:41 PM, Isobel [guest] <guest at bioconductor.org> wrote:
> My problem is in finding a way of accounting for clone within the model in DESeq. In an ideal world I would like to incorporate “clone” as a factor in the model, to account for differences between samples resulting from “clone”, so in the first instance I’d like to compare the following:
>
> ~ clone + species + diet + species:diet
> ~ clone + species + diet
>
> However, I think because “clone” is nested within “species” so any given level of clone is only present within one biotype, simply putting clone into the design formula of DESeq2 does not work,

Note that, because the clones are entirely within a single species,
the species effect is simply a linear combination of the clone
effects. I'll show this with an example using normal data and a linear
model:

R> species <- factor(rep(1:2,each=4))
R> clone <- factor(rep(1:4,each=2))
R> cbind(species, clone)
species clone
[1,]       1     1
[2,]       1     1
[3,]       1     2
[4,]       1     2
[5,]       2     3
[6,]       2     3
[7,]       2     4
[8,]       2     4

R> set.seed(1)
R> y <- rnorm(8)
R> coef(lm(y ~ species))
(Intercept)    species2
0.07921043  0.10448786

The species effect is 0.10448786, which we get by only including species.

If we run the model with only clone:

R> res <- coef(lm(y ~ clone))
R> names(res)
[1] "(Intercept)" "clone2"      "clone3"      "clone4"

The species effect is the average of clone 3 and clone 4 effect minus
the average of clone 2 and clone 1 effect (clone 1 is absorbed by the
intercept in this case, so 0):

R> unname(.5*res["clone3"] + .5*res["clone4"] - (.5*res["clone2"]))
[1] 0.1044879

> I am interested in genes differentially expressed between diet A and diet B
> in genes differentially expressed between species,
> and in particular I am interested in identifying genes where differential expression between diets varies in relation to species (i.e. the diet-species interaction).

So then, getting back to how to accomplish these comparisons in
DESeq2. I'd like to give advice using DESeq2 version >= 1.3, as this
is simpler in the new version, and it will be released very soon (on
April 14). If you can try out this code with the development branch,
that would be great, if not, just email me and I can formulate this
for version 1.2.

For DESeq2 version >= 1.3, I would specify:

design(dds) <- ~ clone + diet + clone:diet

Then the following comparisons:

comparison of diet B over diet A:

results(dds, contrast=c("diet","B","A"))

comparison of species 2 over species 1, this is the average of the
clone effects for each species, so:

ctrst <- numeric(resultsNames(dds))
sp2clones <- paste0("clone",dds\$clone[dds\$species == "2"])
sp1clones <- paste0("clone",dds\$clone[dds\$species == "1"])
ctrst[sp2clones] <- 1/5
ctrst[sp1clones] <- -1/5
results(dds, contast=ctrst)

for the interaction between species 1 and diet, we pull out the
interaction effects and average for each clone:

ctrst <- numeric(resultsNames(dds))
sp1dietA <- paste0("clone",dds\$clone[dds\$species == "1"],".dietA")
sp1dietB <- paste0("clone",dds\$clone[dds\$species == "1"],".dietB")
ctrst[sp1dietB] <- 1/5
ctrst[sp1dietA] <- -1/5
results(dds, contast=ctrst)

Mike

```