[BioC] DESeq: Hypothesis testing in multifactor design

Michael Love michaelisaiahlove at gmail.com
Wed Aug 20 21:54:52 CEST 2014


Hi Yanzhu,

On Aug 20, 2014 7:39 PM, "Yanzhu Lin" <mlinyzh at gmail.com> wrote:
>
> Hi Mike,
>
> So you mean I need to estimate the dispersions every time when I have a
different full model for LRT?

Yes.

> In other words, so I need to use estimateDispersions() once when I test
the main effect, where the full model is ~A+B+C, and use
estimateDispersion() once when I test the three-way interaction term, where
the full model is ~A+B+C+A:B+A:C+B:C+A:B:C?
>
>
> One more question about estimateDispersion(), does it take a very long
time to estimate the dispersions? I have 16649 features and 726 biosamples,
and I run the estimateDispersion function around 9:30 am this morning, and
it hasn't done yet. Any suggestion that I can speed up the
estimateDispesions().
>

Yes, it takes time to estimate dispersion and to fit the GLM when you have
so many samples and when you have ~100 coefficients to fit with all the
interactions you are including.

You could also try the voom transformation and linear modeling with limma

Mike

> Your help will be greatly appreciate, thanks.
>
>
> Best,
>
>
>
> On Wed, Aug 20, 2014 at 12:44 PM, Michael Love <
michaelisaiahlove at gmail.com> wrote:
>>
>> Hi Yanzhu,
>>
>> On Aug 20, 2014 3:59 PM, "Yanzhu Lin" <mlinyzh at gmail.com> wrote:
>> >
>> > Hi Mike,
>> >
>> > I am using DESeq2 package for my project now, and I have some
questions regarding to this pacakge.
>> >
>> > Please let me briefly introduce some background information about my
project. I have three factors: A with 16 levels, B with 2 levels and C with
3 levels, in total 16*2*3=96 groups. There are 8 biosamples for each group,
hence 96*8=768 biosamples in total. Due to some issues, we lost some
replicates for some groups, which ends up 726 biosamples, hence it is
unbalance design.
>> >
>> > Our purpose are to test the main effects of three factor: A, B and C,
the two-way interaction: A:B, A:C and B:C, and the three-way interaction
term: A:B:C. For example, I will compare full model: ~A+B+C with reduce
model: ~A+B to test factor C, and so on for other two main effects. For
testing three-way interaction A:B:C, I will compare full model:
~A+B+C+A:B+A:C+B:C+A:B:C and reduced model ~A+B+C+A:B+A:C+B:C. Then come my
questions.
>> >
>> >
>> > I have different full models for testing main effects, two-way
interaction and three-way interaction term. Will the dispersion estimation
affected by my full model? Can I specify the full model when I use
nbinomLRT()?
>>
>> No, you should not change the design in between, i.e. don't use a
different design for dispersion estimation and the full model in the LRT.
>>
>> Mike
>>
>> > In other words, can I use estimateDispersion() only once and fit
nbinomLRT with different full models as below:
>> >
>> >
>> >
dds<-DESeqDataSetFromMatrix(countData=countdata,colData=coldata,design=~A+B+C+A:B+A:C+B:C+A:B:C)
>> >
>> > ### normalization
>> > dds=estimateSizeFactors(dds)
>> >
>> > ### dispersion estimation:
>> > dds=estimateDispersions(dds)
>> >
>> > ###Test three-way interaction term.
>> > dds<-nbinomLRT(dds,reduced=~A+B+C+A:B+A:C+B:C)
>> > ###Test main effect of factor A:
>> > dds<-nbinomLRT(dds,full=~A+B+C, reduced=~B+C)
>> > ###Test main effect of factor B:
>> > dds<-nbinomLRT(dds,full=~A+B+C, reduced=~A+C)
>> >
>> > ###Test main effect of factor C:
>> > dds<-nbinomLRT(dds,full=~A+B+C, reduced=~A+B)
>> >
>> >
>> > Thanks,
>> >
>> >
>> > Yanzhu
>> >
>> >
>> >
>> >
>> >
>> > On Tue, Jun 10, 2014 at 4:07 PM, Michael Love <
michaelisaiahlove at gmail.com> wrote:
>> >>
>> >> hi Yanzhu,
>> >>
>> >> Note that we recommend users switch to using DESeq2, which also has
>> >> the likelihood ratio test you are using, and is faster and more
>> >> sensitive.
>> >>
>> >> The pipeline would look like:
>> >>
>> >> DESeq(dds, test="LRT", reduced=~ A+B+C+A:B+A:C+B:C)
>> >>
>> >> for your first example.
>> >>
>> >> For your question, the terms of the reduced model should be contained
>> >> within the full model. Still there are a number of models which
>> >> satisfy this requirement, e.g. for testing B:C, you could use
>> >> A+B+C+A:B+A:C+B:C and A+B+C+A:B+A:C as full and reduced respectively.
>> >> Or you could use A+B+C+B:C and A+B+C. The importance of these other
>> >> interaction terms depends on context, whether they are very
>> >> explanatory or not.
>> >>
>> >> Mike
>> >>
>> >> On Tue, Jun 10, 2014 at 11:21 AM, yanzhu [guest] <
guest at bioconductor.org> wrote:
>> >> > Dear Community,
>> >> >
>> >> > I have a question about the hypothesis testing of the two-way
interaction terms in a multifactor design which includes three factors: A,
B and C.
>> >> >
>> >> > When I tested the three-way interaction I used the full and reduced
models as below for nbinomGLMTest():
>> >> > Full: count ~ A+B+C+A:B+A:C+B:C+A:B:C
>> >> > Reduced: count ~ A+B+C+A:B+A:C+B:C
>> >> >
>> >> > Now comes my question, when I want to test the effect of two-way
interaction terms, i.e., A:B, A:C or B:C, what should be my full and
reduced models? For example, when I want to the test the effect of A:B,
what should be my full and reduced models for nbinomGLMTest() using DESeq
pacakge?
>> >> >
>> >> >
>> >> > Best,
>> >> >
>> >> >
>> >> >
>> >> > Yanzhu
>> >> >
>> >> >
>> >> >  -- output of sessionInfo():
>> >> >
>> >> > sessionInfo()
>> >> > R version 3.1.0 (2014-04-10)
>> >> > Platform: x86_64-w64-mingw32/x64 (64-bit)
>> >> >
>> >> > locale:
>> >> > [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252    LC_MONETARY=English_United States.1252
>> >> > [4] LC_NUMERIC=C                           LC_TIME=English_United
States.1252
>> >> >
>> >> > attached base packages:
>> >> > [1] parallel  stats     graphics  grDevices utils     datasets
methods   base
>> >> >
>> >> > other attached packages:
>> >> > [1] DESeq_1.16.0        lattice_0.20-29     locfit_1.5-9.1
Biobase_2.24.0      BiocGenerics_0.10.0 edgeR_3.6.1         limma_3.20.1
>> >> >
>> >> > loaded via a namespace (and not attached):
>> >> >  [1] annotate_1.42.0      AnnotationDbi_1.26.0 DBI_0.2-7
genefilter_1.46.0    geneplotter_1.42.0   GenomeInfoDb_1.0.2
>> >> >  [7] grid_3.1.0           IRanges_1.22.6       MASS_7.3-31
RColorBrewer_1.0-5   RSQLite_0.11.4       splines_3.1.0
>> >> > [13] stats4_3.1.0         survival_2.37-7      tools_3.1.0
XML_3.98-1.1         xtable_1.7-3
>> >> >
>> >> >
>> >> > --
>> >> > Sent via the guest posting facility at bioconductor.org.
>> >> >
>> >> > _______________________________________________
>> >> > Bioconductor mailing list
>> >> > Bioconductor at r-project.org
>> >> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >> > Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >
>> >
>
>

	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list