[BioC] DESeq2 test over all timepoints?

Wed Feb 12 13:27:01 CET 2014

Dear Mike (and other DESeq2 developers/users),

I came across this thread regarding the use of the GLM functions in DESeq2 with respect to time series. However, if I try and perform the analysis you describe then I get an error when estimating the dispersions— do you know what is the cause and/or what I could be doing wrong?

dds <- DESeqDataSetFromMatrix( countData = countMatrix, colData = colData, design = ~ time + treatment + treatment:time)
dds <- estimateSizeFactors(dds)
dds <- estimateDispersions(dds)
gene-wise dispersion estimates
Error in fitDisp(ySEXP = counts(objectNZ), xSEXP = fit$modelMatrix, mu_hatSEXP = fit$mu_hat,  : 
  in call to fitDisp, the following arguments contain NA: mu_hatSEXP

The matrix of counts and the colData are OK, i think..

This is the colData df used:

       treatment time
s1    treated   T1
s2    treated   T1
s3    treated   T1
s4    untreated   T1
s5    untreated   T1
s6    untreated   T1
s7    treated   T2
s8    treated   T2
s9    treated   T2
s10  untreated   T2
s11  untreated   T2
s12  untreated   T2

Bests,

Richard

# Session info:

> sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] grid      parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ggplot2_0.9.3.1         gplots_2.11.0           MASS_7.3-26             KernSmooth_2.23-10     
 [5] caTools_1.14            gdata_2.12.0            gtools_2.7.1            reshape_0.8.4          
 [9] plyr_1.8                pasilla_0.2.16          DESeq_1.12.0            locfit_1.5-9           
[13] DEXSeq_1.6.0            parathyroidSE_0.99.5    DESeq2_1.0.9            RcppArmadillo_0.3.810.2
[17] Rcpp_0.10.3             lattice_0.20-15         Biobase_2.20.0          GenomicRanges_1.12.2   
[21] IRanges_1.18.0          BiocGenerics_0.6.0     

loaded via a namespace (and not attached):
 [1] annotate_1.38.0      AnnotationDbi_1.22.3 biomaRt_2.16.0       Biostrings_2.28.0    bitops_1.0-5        
 [6] colorspace_1.2-2     DBI_0.2-6            dichromat_2.0-0      digest_0.6.3         genefilter_1.42.0   
[11] geneplotter_1.38.0   gtable_0.1.2         hwriter_1.3          labeling_0.1         munsell_0.4         
[16] proto_0.3-10         RColorBrewer_1.0-5   RCurl_1.95-4.1       reshape2_1.2.2       Rsamtools_1.12.2    
[21] RSQLite_0.11.3       scales_0.2.3         splines_3.0.0        statmod_1.4.17       stats4_3.0.0        
[26] stringr_0.6.2        survival_2.37-4      tools_3.0.0          XML_3.95-0.2         xtable_1.7-1        
[31] zlibbioc_1.6.0      

>hi Charles,

>>On Tue, Jul 9, 2013 at 3:59 PM, Charles Determan Jr wrote:
>>Greetings,

>>I have used the DESeq package previously and have been recently using
>>DESeq2. I am particularly interested in repeated measures designs and was
>>wondering about applications with DESeq2. I have read through the manual
>>and tried searching the archives but couldn't find too much direction for
>>testing over all timepoints instead of just one at a time (ANOVA-like
>>approach). Reading the edgeR manual, it provides an example in section
>>3.3.4 that tests whether a treatment has an effect at any time by taking
>>multiple coefficients (i.e. lrt <- glmLRT(fit, coef=5:6)). I attempted
>>something similar with DESeq2:

>>res <- results(dds, name=resultsNames(dds)[5:6]

>>but I got the warning message saying only the first element used:

>>Warning message:In if (paste0("WaldPvalue_", name) %in%
>>names(mcols(object))) { :
>>the condition has length > 1 and only the first element will be used

>I should clean up the code to provide a warning here, as the results()
>function should only accept a character vector of length 1 for the
>argument 'name'.

>The proper way to test for the significance of multiple coefficients
>at once is to use the nbinomLRT() function in DESeq2 and specify a
>reduced formula. To test whether the treatment effect at all times is
>different than at the baseline time, the reduced formula would remove
>the interaction term between treatment and time, so:

>design(dds) <- formula(~ time + treatment + treatment:time)
>dds <- estimateSizeFactors(dds)
>dds <- estimateDispersions(dds)
>dds <- nbinomLRT(dds, reduced = formula(~ time + treatment))
>res <- results(dds)

>If you presume that the treatment effect is the same at all times, you
>can test whether the treatment effect is equal to 0 with:

># using the Wald test and coefficient shrinkage
>design(dds) <- formula(~ time + treatment)
>dds <- DESeq(dds)
>res <- results(dds)

># or using the likelihood ratio test as in the previous example
>design(dds) <- formula(~ time + treatment)
>dds <- estimateSizeFactors(dds)
>dds <- estimateDispersions(dds)
>dds <- nbinomLRT(dds, reduced = formula(~ time))
>res <- results(dds)

>The main difference here between the Wald and LRT tests is the
>shrinkage of estimated log2 fold changes to 0 using the default
>DESeq() function/Wald test.

>I will add more examples to the vignette to better explain these cases
>of testing multiple coefficients.

>Mike