[BioC] DESeq: multi-factors testing questions

Tue Aug 12 16:51:31 CEST 2014

hi Yanzhu,

Apologies, I misread your email earlier, and thought you had 8 samples
total. You're right that you have many degrees of freedom here.

In response to your previous questions:

To paraphrase: to test the main effect of A, what's the different
between using a full model ~ A and reduced model ~ 1 vs. using a full
model ~ A + B and reduced model ~ B?  The difference is the the second
model "accounts for" differences due to B while testing the effect of
A. I would say that generally the second is preferred. You should
consult a local statistician to explain this difference though, as
this is a subtle one which is not easy to explain over email threads.

The same is true for testing C: ~ C and ~ 1 vs. ~ A + B + C and ~ A +
B. The second model "accounts for" differences due to A and B while
testing the effect of C. The same is also true for your questions
about the interaction terms, e.g. whether you should include the A:B
term when testing A:C or not.

These are subtle design choices which are best explained by consulting
someone locally. I can tell you that using DESeq2 with test="LRT" is
analogous to standard ANOVA for each gene, and so the same design
choices apply as would apply to standard ANOVA.

Mike

On Tue, Aug 12, 2014 at 10:03 AM, Yanzhu Lin <mlinyzh at gmail.com> wrote:
> Hi Mike,
>
> I would like to try DESeq2 package now.
>
> My project has three factors: A with 16 levels, B with 2 levels and C with 3
> levels, in total I have 16 x 3 x 2 = 96 groups, for each group, we have 8
> biological replicates in our original experiment design, which ends up with
> 96 x 8 =768 biosamples.
>
> I can't understand why we can't estimate any interaction terms if there are
> 8 replicates per group, which was mentioned in your previous email.
>
> Based on 8 biological replicates per group, I think we still have enough df
> to test each interaction term.
>
>
>
> Best,
>
>
> Yanzhu
>
>
>
>
>
>  Jan 15, 2014 at 11:12 AM, Michael Love <michaelisaiahlove at gmail.com> wrote:
>>
>> hi Yanzhu,
>>
>> Firstly, we recommend in general moving to DESeq2, where we spend most of
>> our development effort and which will be supported going foward.
>>
>> The DESeq2 workflow is very similar to DESeq and is described in detail in
>> the vignette. The constructor function from a count matrix and column data
>> is DESeqDataSetFromMatrix(), see the manual page for the required arguments.
>> For likelihood ratio tests, you can use the following code:
>>
>> dds <- DESeqDataSetFromMatrix(counts, columndata, design = ~ A)
>> dds <- DESeq(dds, test="LRT", reduced = ~ 1)
>> res <- results(dds)
>> res
>>
>> In order to give more detailed recommendations though, can you please tell
>> us more about the experimental setup and what questions you are trying to
>> answer? For instance, how are the samples distributed across groups A, B and
>> C? I would suppose they are not 8 replicates per group, as this would not
>> allow you to estimate any interaction terms.
>>
>> Mike
>>
>>
>> On Mon, Jan 13, 2014 at 2:15 PM, Yanzhu [guest] <guest at bioconductor.org>
>> wrote:
>>>
>>>
>>> Dear Community,
>>>
>>> I have some questions about how the DESeq r package  works for
>>> multi-factors expersiment. My experiment has three factors: A/B/C, and 8
>>> replicates per condition. I would like the test the significance of the main
>>> effects of factor A, B and C, the significance of the two-way interaction
>>> terms: A:B, A:C and B:C, and the significance of the three-way interaction
>>> term: A:B:C. I want the table of pvalue for each term (main effects, two-way
>>> interaction terms and the three-way interaction term) like what ANOVA does
>>> for each gene.
>>>
>>>
>>> I know to test the significance of the three-way interaction term, we use
>>> the following coding:
>>> fitDeSeq1<-fitNbinomGLMs(cdsFull,count~A+B+C+A:B+A:C+B:C+A:B:C)
>>> fitDeSeq0<-fitNbinomGLMs(cdsFull,count~A+B+C+A:B+A:C+B:C)
>>> modDeSeq<-nbinomGLMTest(fitDeSeq1,fitDeSeq0)
>>>
>>> My Questions are: how can I test the significance of main effects and the
>>> two-way interaction terms?
>>>
>>> 1. To test the main effect of A, B and C
>>>
>>> (i) To test the main effect of A:
>>> fitDeSeq1<-fitNbinomGLMs(cdsFull,count~A)
>>> fitDeSeq0<-fitNbinomGLMs(cdsFull,count~1)
>>> modDeSeq<-nbinomGLMTest(fitDeSeq1,fitDeSeq0)
>>>
>>> (ii) To test the main effect of B:
>>> Do I need to use the following coding:
>>> fitDeSeq1<-fitNbinomGLMs(cdsFull,count~B)
>>> fitDeSeq0<-fitNbinomGLMs(cdsFull,count~1)
>>> modDeSeq<-nbinomGLMTest(fitDeSeq1,fitDeSeq0)
>>>
>>> OR:
>>>
>>> Do I need to use the following coding:
>>> fitDeSeq1<-fitNbinomGLMs(cdsFull,count~A+B)
>>> itDeSeq0<-fitNbinomGLMs(cdsFull,count~B)
>>> modDeSeq<-nbinomGLMTest(fitDeSeq1,fitDeSeq0)
>>>
>>> Which one is correct?
>>>
>>> (iii) To test the main effect of C:
>>> Do I need to use the following coding:
>>> fitDeSeq1<-fitNbinomGLMs(cdsFull,count~C)
>>> fitDeSeq0<-fitNbinomGLMs(cdsFull,count~1)
>>> modDeSeq<-nbinomGLMTest(fitDeSeq1,fitDeSeq0)
>>>
>>> OR:
>>>
>>> Do I need to use the following coding:
>>> fitDeSeq1<-fitNbinomGLMs(cdsFull,count~A+B+C)
>>> itDeSeq0<-fitNbinomGLMs(cdsFull,count~A+B)
>>> modDeSeq<-nbinomGLMTest(fitDeSeq1,fitDeSeq0)
>>>
>>> which one is correct?
>>>
>>> 2. To test the two-way interaction terms: A:B, A:C and B:C
>>>
>>> (i) To test the two-way interaction term: A:B
>>> Do I need to use the following coding:
>>> fitDeSeq1<-fitNbinomGLMs(cdsFull,count~A+B+C+A:B)
>>> fitDeSeq0<-fitNbinomGLMs(cdsFull,count~A+B+C)
>>> modDeSeq<-nbinomGLMTest(fitDeSeq1,fitDeSeq0)
>>>
>>> Is it correct?
>>>
>>> (ii) To test the two-way interaction term: A:C
>>> Do I need to use the following coding:
>>> fitDeSeq1<-fitNbinomGLMs(cdsFull,count~A+B+C+A:B+A:C)
>>> fitDeSeq0<-fitNbinomGLMs(cdsFull,count~A+B+C+A:B)
>>> modDeSeq<-nbinomGLMTest(fitDeSeq1,fitDeSeq0)
>>>
>>> OR:
>>>
>>> Do I need to use the following coding:
>>> fitDeSeq1<-fitNbinomGLMs(cdsFull,count~A+B+C+A:C)
>>> fitDeSeq0<-fitNbinomGLMs(cdsFull,count~A+B+C)
>>> modDeSeq<-nbinomGLMTest(fitDeSeq1,fitDeSeq0)
>>>
>>> Which one is correct?
>>>
>>> (iii) To test the two-way interaction term: B:C
>>> Do I need to use the following coding:
>>> fitDeSeq1<-fitNbinomGLMs(cdsFull,count~A+B+C+A:B+A:C+B:C)
>>> fitDeSeq0<-fitNbinomGLMs(cdsFull,count~A+B+C+A:B+A:C)
>>> modDeSeq<-nbinomGLMTest(fitDeSeq1,fitDeSeq0)
>>>
>>> OR:
>>>
>>> Do I need to use the following coding:
>>> fitDeSeq1<-fitNbinomGLMs(cdsFull,count~A+B+C+B:C)
>>> fitDeSeq0<-fitNbinomGLMs(cdsFull,count~A+B+C)
>>> modDeSeq<-nbinomGLMTest(fitDeSeq1,fitDeSeq0)
>>>
>>> Which one is correct?
>>>
>>> Thank you!
>>>
>>>
>>>
>>>
>>>
>>>
>>>  -- output of sessionInfo():
>>>
>>>  sessionInfo()
>>> R version 3.0.1 (2013-05-16)
>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>
>>> locale:
>>> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
>>> States.1252    LC_MONETARY=English_United States.1252
>>> [4] LC_NUMERIC=C                           LC_TIME=English_United
>>> States.1252
>>>
>>> attached base packages:
>>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>>> base
>>>
>>> other attached packages:
>>> [1] DESeq_1.12.1       lattice_0.20-15    locfit_1.5-9.1
>>> Biobase_2.20.1     BiocGenerics_0.6.0 edgeR_3.2.4
>>> [7] limma_3.16.8
>>>
>>> loaded via a namespace (and not attached):
>>>  [1] annotate_1.38.0      AnnotationDbi_1.22.6 DBI_0.2-7
>>> genefilter_1.42.0    geneplotter_1.38.0
>>>  [6] grid_3.0.1           IRanges_1.18.4       MASS_7.3-26
>>> RColorBrewer_1.0-5   RSQLite_0.11.4
>>> [11] splines_3.0.1        stats4_3.0.1         survival_2.37-4
>>> tools_3.0.1          XML_3.98-1.1
>>> [16] xtable_1.7-1
>>>
>>> --
>>> Sent via the guest posting facility at bioconductor.org.
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>