[BioC] DESeq: GLMs for RNA-Seq with interaction terms

Kasper Daniel Hansen kasperdanielhansen at gmail.com
Wed Jul 13 22:17:42 CEST 2011


On Wed, Jul 13, 2011 at 3:56 PM, Eli Meyer <EliMeyer at mail.utexas.edu> wrote:
> I have a reasonably large RNA-Seq dataset for a non-model plant, and want to
> fit a GLM with interaction terms:
> expression ~ treatment + time + treatment:time
>
> In theory, I should be able to test the significance of each term by
> comparing a reduced model (dropping that term) with the full model above.
>
> It seems DESeq can handle this for main effect terms: it produces a vector
> of p-values with a reasonable distribution. See this example, using data
> from the pasilla package:
> data(pasillaGenes)
> design <- pData(pasillaGenes)[, c("condition", "type")]
> fullCountsTable <- counts(pasillaGenes)
> cdsFull <- newCountDataSet(fullCountsTable[1:1000,], design)
> cdsFull <- estimateSizeFactors(cdsFull)
> cdsFull <- estimateDispersions(cdsFull, method="pooled")
> fit1 <- fitNbinomGLMs(cdsFull, count ~ type + condition)
> fit0 <- fitNbinomGLMs(cdsFull, count ~ type)
> pvalsGLM <- nbinomGLMTest(fit1, fit0)
>
>
> However, when I try this with an interaction term included it gives clearly
> wrong results: every p value is either 0 or 1.
> fit1 <- fitNbinomGLMs(cdsFull, count ~ type + condition + type:condition)
> fit0 <- fitNbinomGLMs(cdsFull, count ~ type + type:condition)
> pvalsGLM <- nbinomGLMTest(fit1, fit0)

The smaller model makes no sense.  You are (in words) asking for an
interaction between condition and type without including a condition
(main) effect.  If you are confused about this (and I agree it might
be confusing at first depending on your training) you should read any
introduction to linear models.

Kasper



More information about the Bioconductor mailing list