[BioC] DESeq: Hypothesis testing in multifactor design

Yanzhu Lin mlinyzh at gmail.com
Wed Aug 20 22:21:40 CEST 2014


Hi Mike,

Thank you so much for your prompt reply. I really appreciate it.


Best,


Yanzhu

On Wed, Aug 20, 2014 at 3:54 PM, Michael Love <michaelisaiahlove at gmail.com>
wrote:

> Hi Yanzhu,
>
> On Aug 20, 2014 7:39 PM, "Yanzhu Lin" <mlinyzh at gmail.com> wrote:
> >
> > Hi Mike,
> >
> > So you mean I need to estimate the dispersions every time when I have a
> different full model for LRT?
>
> Yes.
>
> > In other words, so I need to use estimateDispersions() once when I test
> the main effect, where the full model is ~A+B+C, and use
> estimateDispersion() once when I test the three-way interaction term, where
> the full model is ~A+B+C+A:B+A:C+B:C+A:B:C?
> >
> >
> > One more question about estimateDispersion(), does it take a very long
> time to estimate the dispersions? I have 16649 features and 726 biosamples,
> and I run the estimateDispersion function around 9:30 am this morning, and
> it hasn't done yet. Any suggestion that I can speed up the
> estimateDispesions().
> >
>
> Yes, it takes time to estimate dispersion and to fit the GLM when you have
> so many samples and when you have ~100 coefficients to fit with all the
> interactions you are including.
>
> You could also try the voom transformation and linear modeling with limma
>
> Mike
>
> > Your help will be greatly appreciate, thanks.
> >
> >
> > Best,
> >
> >
> >
> > On Wed, Aug 20, 2014 at 12:44 PM, Michael Love <
> michaelisaiahlove at gmail.com> wrote:
> >>
> >> Hi Yanzhu,
> >>
> >> On Aug 20, 2014 3:59 PM, "Yanzhu Lin" <mlinyzh at gmail.com> wrote:
> >> >
> >> > Hi Mike,
> >> >
> >> > I am using DESeq2 package for my project now, and I have some
> questions regarding to this pacakge.
> >> >
> >> > Please let me briefly introduce some background information about my
> project. I have three factors: A with 16 levels, B with 2 levels and C with
> 3 levels, in total 16*2*3=96 groups. There are 8 biosamples for each group,
> hence 96*8=768 biosamples in total. Due to some issues, we lost some
> replicates for some groups, which ends up 726 biosamples, hence it is
> unbalance design.
> >> >
> >> > Our purpose are to test the main effects of three factor: A, B and C,
> the two-way interaction: A:B, A:C and B:C, and the three-way interaction
> term: A:B:C. For example, I will compare full model: ~A+B+C with reduce
> model: ~A+B to test factor C, and so on for other two main effects. For
> testing three-way interaction A:B:C, I will compare full model:
> ~A+B+C+A:B+A:C+B:C+A:B:C and reduced model ~A+B+C+A:B+A:C+B:C. Then come my
> questions.
> >> >
> >> >
> >> > I have different full models for testing main effects, two-way
> interaction and three-way interaction term. Will the dispersion estimation
> affected by my full model? Can I specify the full model when I use
> nbinomLRT()?
> >>
> >> No, you should not change the design in between, i.e. don't use a
> different design for dispersion estimation and the full model in the LRT.
> >>
> >> Mike
> >>
> >> > In other words, can I use estimateDispersion() only once and fit
> nbinomLRT with different full models as below:
> >> >
> >> >
> >> >
> dds<-DESeqDataSetFromMatrix(countData=countdata,colData=coldata,design=~A+B+C+A:B+A:C+B:C+A:B:C)
> >> >
> >> > ### normalization
> >> > dds=estimateSizeFactors(dds)
> >> >
> >> > ### dispersion estimation:
> >> > dds=estimateDispersions(dds)
> >> >
> >> > ###Test three-way interaction term.
> >> > dds<-nbinomLRT(dds,reduced=~A+B+C+A:B+A:C+B:C)
> >> > ###Test main effect of factor A:
> >> > dds<-nbinomLRT(dds,full=~A+B+C, reduced=~B+C)
> >> > ###Test main effect of factor B:
> >> > dds<-nbinomLRT(dds,full=~A+B+C, reduced=~A+C)
> >> >
> >> > ###Test main effect of factor C:
> >> > dds<-nbinomLRT(dds,full=~A+B+C, reduced=~A+B)
> >> >
> >> >
> >> > Thanks,
> >> >
> >> >
> >> > Yanzhu
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Tue, Jun 10, 2014 at 4:07 PM, Michael Love <
> michaelisaiahlove at gmail.com> wrote:
> >> >>
> >> >> hi Yanzhu,
> >> >>
> >> >> Note that we recommend users switch to using DESeq2, which also has
> >> >> the likelihood ratio test you are using, and is faster and more
> >> >> sensitive.
> >> >>
> >> >> The pipeline would look like:
> >> >>
> >> >> DESeq(dds, test="LRT", reduced=~ A+B+C+A:B+A:C+B:C)
> >> >>
> >> >> for your first example.
> >> >>
> >> >> For your question, the terms of the reduced model should be contained
> >> >> within the full model. Still there are a number of models which
> >> >> satisfy this requirement, e.g. for testing B:C, you could use
> >> >> A+B+C+A:B+A:C+B:C and A+B+C+A:B+A:C as full and reduced respectively.
> >> >> Or you could use A+B+C+B:C and A+B+C. The importance of these other
> >> >> interaction terms depends on context, whether they are very
> >> >> explanatory or not.
> >> >>
> >> >> Mike
> >> >>
> >> >> On Tue, Jun 10, 2014 at 11:21 AM, yanzhu [guest] <
> guest at bioconductor.org> wrote:
> >> >> > Dear Community,
> >> >> >
> >> >> > I have a question about the hypothesis testing of the two-way
> interaction terms in a multifactor design which includes three factors: A,
> B and C.
> >> >> >
> >> >> > When I tested the three-way interaction I used the full and
> reduced models as below for nbinomGLMTest():
> >> >> > Full: count ~ A+B+C+A:B+A:C+B:C+A:B:C
> >> >> > Reduced: count ~ A+B+C+A:B+A:C+B:C
> >> >> >
> >> >> > Now comes my question, when I want to test the effect of two-way
> interaction terms, i.e., A:B, A:C or B:C, what should be my full and
> reduced models? For example, when I want to the test the effect of A:B,
> what should be my full and reduced models for nbinomGLMTest() using DESeq
> pacakge?
> >> >> >
> >> >> >
> >> >> > Best,
> >> >> >
> >> >> >
> >> >> >
> >> >> > Yanzhu
> >> >> >
> >> >> >
> >> >> >  -- output of sessionInfo():
> >> >> >
> >> >> > sessionInfo()
> >> >> > R version 3.1.0 (2014-04-10)
> >> >> > Platform: x86_64-w64-mingw32/x64 (64-bit)
> >> >> >
> >> >> > locale:
> >> >> > [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
> States.1252    LC_MONETARY=English_United States.1252
> >> >> > [4] LC_NUMERIC=C                           LC_TIME=English_United
> States.1252
> >> >> >
> >> >> > attached base packages:
> >> >> > [1] parallel  stats     graphics  grDevices utils     datasets
> methods   base
> >> >> >
> >> >> > other attached packages:
> >> >> > [1] DESeq_1.16.0        lattice_0.20-29     locfit_1.5-9.1
> Biobase_2.24.0      BiocGenerics_0.10.0 edgeR_3.6.1         limma_3.20.1
> >> >> >
> >> >> > loaded via a namespace (and not attached):
> >> >> >  [1] annotate_1.42.0      AnnotationDbi_1.26.0 DBI_0.2-7
>   genefilter_1.46.0    geneplotter_1.42.0   GenomeInfoDb_1.0.2
> >> >> >  [7] grid_3.1.0           IRanges_1.22.6       MASS_7.3-31
>   RColorBrewer_1.0-5   RSQLite_0.11.4       splines_3.1.0
> >> >> > [13] stats4_3.1.0         survival_2.37-7      tools_3.1.0
>   XML_3.98-1.1         xtable_1.7-3
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Sent via the guest posting facility at bioconductor.org.
> >> >> >
> >> >> > _______________________________________________
> >> >> > Bioconductor mailing list
> >> >> > Bioconductor at r-project.org
> >> >> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> >> > Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >> >
> >> >
> >
> >
>

	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list