[BioC] DESeq: Hypothesis testing in multifactor design
Michael Love
michaelisaiahlove at gmail.com
Wed Aug 20 21:54:52 CEST 2014
Hi Yanzhu,
On Aug 20, 2014 7:39 PM, "Yanzhu Lin" <mlinyzh at gmail.com> wrote:
>
> Hi Mike,
>
> So you mean I need to estimate the dispersions every time when I have a
different full model for LRT?
Yes.
> In other words, so I need to use estimateDispersions() once when I test
the main effect, where the full model is ~A+B+C, and use
estimateDispersion() once when I test the three-way interaction term, where
the full model is ~A+B+C+A:B+A:C+B:C+A:B:C?
>
>
> One more question about estimateDispersion(), does it take a very long
time to estimate the dispersions? I have 16649 features and 726 biosamples,
and I run the estimateDispersion function around 9:30 am this morning, and
it hasn't done yet. Any suggestion that I can speed up the
estimateDispesions().
>
Yes, it takes time to estimate dispersion and to fit the GLM when you have
so many samples and when you have ~100 coefficients to fit with all the
interactions you are including.
You could also try the voom transformation and linear modeling with limma
Mike
> Your help will be greatly appreciate, thanks.
>
>
> Best,
>
>
>
> On Wed, Aug 20, 2014 at 12:44 PM, Michael Love <
michaelisaiahlove at gmail.com> wrote:
>>
>> Hi Yanzhu,
>>
>> On Aug 20, 2014 3:59 PM, "Yanzhu Lin" <mlinyzh at gmail.com> wrote:
>> >
>> > Hi Mike,
>> >
>> > I am using DESeq2 package for my project now, and I have some
questions regarding to this pacakge.
>> >
>> > Please let me briefly introduce some background information about my
project. I have three factors: A with 16 levels, B with 2 levels and C with
3 levels, in total 16*2*3=96 groups. There are 8 biosamples for each group,
hence 96*8=768 biosamples in total. Due to some issues, we lost some
replicates for some groups, which ends up 726 biosamples, hence it is
unbalance design.
>> >
>> > Our purpose are to test the main effects of three factor: A, B and C,
the two-way interaction: A:B, A:C and B:C, and the three-way interaction
term: A:B:C. For example, I will compare full model: ~A+B+C with reduce
model: ~A+B to test factor C, and so on for other two main effects. For
testing three-way interaction A:B:C, I will compare full model:
~A+B+C+A:B+A:C+B:C+A:B:C and reduced model ~A+B+C+A:B+A:C+B:C. Then come my
questions.
>> >
>> >
>> > I have different full models for testing main effects, two-way
interaction and three-way interaction term. Will the dispersion estimation
affected by my full model? Can I specify the full model when I use
nbinomLRT()?
>>
>> No, you should not change the design in between, i.e. don't use a
different design for dispersion estimation and the full model in the LRT.
>>
>> Mike
>>
>> > In other words, can I use estimateDispersion() only once and fit
nbinomLRT with different full models as below:
>> >
>> >
>> >
dds<-DESeqDataSetFromMatrix(countData=countdata,colData=coldata,design=~A+B+C+A:B+A:C+B:C+A:B:C)
>> >
>> > ### normalization
>> > dds=estimateSizeFactors(dds)
>> >
>> > ### dispersion estimation:
>> > dds=estimateDispersions(dds)
>> >
>> > ###Test three-way interaction term.
>> > dds<-nbinomLRT(dds,reduced=~A+B+C+A:B+A:C+B:C)
>> > ###Test main effect of factor A:
>> > dds<-nbinomLRT(dds,full=~A+B+C, reduced=~B+C)
>> > ###Test main effect of factor B:
>> > dds<-nbinomLRT(dds,full=~A+B+C, reduced=~A+C)
>> >
>> > ###Test main effect of factor C:
>> > dds<-nbinomLRT(dds,full=~A+B+C, reduced=~A+B)
>> >
>> >
>> > Thanks,
>> >
>> >
>> > Yanzhu
>> >
>> >
>> >
>> >
>> >
>> > On Tue, Jun 10, 2014 at 4:07 PM, Michael Love <
michaelisaiahlove at gmail.com> wrote:
>> >>
>> >> hi Yanzhu,
>> >>
>> >> Note that we recommend users switch to using DESeq2, which also has
>> >> the likelihood ratio test you are using, and is faster and more
>> >> sensitive.
>> >>
>> >> The pipeline would look like:
>> >>
>> >> DESeq(dds, test="LRT", reduced=~ A+B+C+A:B+A:C+B:C)
>> >>
>> >> for your first example.
>> >>
>> >> For your question, the terms of the reduced model should be contained
>> >> within the full model. Still there are a number of models which
>> >> satisfy this requirement, e.g. for testing B:C, you could use
>> >> A+B+C+A:B+A:C+B:C and A+B+C+A:B+A:C as full and reduced respectively.
>> >> Or you could use A+B+C+B:C and A+B+C. The importance of these other
>> >> interaction terms depends on context, whether they are very
>> >> explanatory or not.
>> >>
>> >> Mike
>> >>
>> >> On Tue, Jun 10, 2014 at 11:21 AM, yanzhu [guest] <
guest at bioconductor.org> wrote:
>> >> > Dear Community,
>> >> >
>> >> > I have a question about the hypothesis testing of the two-way
interaction terms in a multifactor design which includes three factors: A,
B and C.
>> >> >
>> >> > When I tested the three-way interaction I used the full and reduced
models as below for nbinomGLMTest():
>> >> > Full: count ~ A+B+C+A:B+A:C+B:C+A:B:C
>> >> > Reduced: count ~ A+B+C+A:B+A:C+B:C
>> >> >
>> >> > Now comes my question, when I want to test the effect of two-way
interaction terms, i.e., A:B, A:C or B:C, what should be my full and
reduced models? For example, when I want to the test the effect of A:B,
what should be my full and reduced models for nbinomGLMTest() using DESeq
pacakge?
>> >> >
>> >> >
>> >> > Best,
>> >> >
>> >> >
>> >> >
>> >> > Yanzhu
>> >> >
>> >> >
>> >> > -- output of sessionInfo():
>> >> >
>> >> > sessionInfo()
>> >> > R version 3.1.0 (2014-04-10)
>> >> > Platform: x86_64-w64-mingw32/x64 (64-bit)
>> >> >
>> >> > locale:
>> >> > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
States.1252 LC_MONETARY=English_United States.1252
>> >> > [4] LC_NUMERIC=C LC_TIME=English_United
States.1252
>> >> >
>> >> > attached base packages:
>> >> > [1] parallel stats graphics grDevices utils datasets
methods base
>> >> >
>> >> > other attached packages:
>> >> > [1] DESeq_1.16.0 lattice_0.20-29 locfit_1.5-9.1
Biobase_2.24.0 BiocGenerics_0.10.0 edgeR_3.6.1 limma_3.20.1
>> >> >
>> >> > loaded via a namespace (and not attached):
>> >> > [1] annotate_1.42.0 AnnotationDbi_1.26.0 DBI_0.2-7
genefilter_1.46.0 geneplotter_1.42.0 GenomeInfoDb_1.0.2
>> >> > [7] grid_3.1.0 IRanges_1.22.6 MASS_7.3-31
RColorBrewer_1.0-5 RSQLite_0.11.4 splines_3.1.0
>> >> > [13] stats4_3.1.0 survival_2.37-7 tools_3.1.0
XML_3.98-1.1 xtable_1.7-3
>> >> >
>> >> >
>> >> > --
>> >> > Sent via the guest posting facility at bioconductor.org.
>> >> >
>> >> > _______________________________________________
>> >> > Bioconductor mailing list
>> >> > Bioconductor at r-project.org
>> >> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >> > Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >
>> >
>
>
[[alternative HTML version deleted]]
More information about the Bioconductor
mailing list