[BioC] DESeq: Hypothesis testing in multifactor design

Michael Love michaelisaiahlove at gmail.com
Fri Aug 22 16:23:37 CEST 2014


Hi Yanzhu,

I have already restructured this part of the code in the development
branch, which will be released in  October. As I haven't seen this error
before, i guess it has to do with the ~100 of coefficients and 700 samples,
so I probably won't work on a fix for the release branch.

You can either try the development branch of Bioconductor, or wait until
the next release.

Hope this helps,

Mike
On Aug 22, 2014 1:59 PM, "Yanzhu Lin" <mlinyzh at gmail.com> wrote:

> Hi Mike,
>
> It took quite a long long time to estimate the dispersion using
> estimateDispersions(), and it also reported an error message as below:
>
>
>
> dds<-DESeqDataSetFromMatrix(countData=countdata,colData=coldata,design=~A+B+C+A:B+A:C+B:C+A:B:C)
> dds=estimateSizeFactors(dds)
> dds=estimateDispersions(dds)
>
>
>
>
> *gene-wise dispersion estimatesError in simpleLoess(y, x, w, span, degree,
> parametric, drop.square, normalize,  :   NA/NaN/Inf in foreign function
> call (arg 1)*
> Do you have any suggestion about how to solve this problem? Thanks.
>
>
> Best,
>
>
> Yanzhu
>
> On Wed, Aug 20, 2014 at 3:54 PM, Michael Love <michaelisaiahlove at gmail.com
> > wrote:
>
>> Hi Yanzhu,
>>
>> On Aug 20, 2014 7:39 PM, "Yanzhu Lin" <mlinyzh at gmail.com> wrote:
>> >
>> > Hi Mike,
>> >
>> > So you mean I need to estimate the dispersions every time when I have a
>> different full model for LRT?
>>
>> Yes.
>>
>> > In other words, so I need to use estimateDispersions() once when I test
>> the main effect, where the full model is ~A+B+C, and use
>> estimateDispersion() once when I test the three-way interaction term, where
>> the full model is ~A+B+C+A:B+A:C+B:C+A:B:C?
>> >
>> >
>> > One more question about estimateDispersion(), does it take a very long
>> time to estimate the dispersions? I have 16649 features and 726 biosamples,
>> and I run the estimateDispersion function around 9:30 am this morning, and
>> it hasn't done yet. Any suggestion that I can speed up the
>> estimateDispesions().
>> >
>>
>> Yes, it takes time to estimate dispersion and to fit the GLM when you
>> have so many samples and when you have ~100 coefficients to fit with all
>> the interactions you are including.
>>
>> You could also try the voom transformation and linear modeling with limma
>>
>> Mike
>>
>> > Your help will be greatly appreciate, thanks.
>> >
>> >
>> > Best,
>> >
>> >
>> >
>> > On Wed, Aug 20, 2014 at 12:44 PM, Michael Love <
>> michaelisaiahlove at gmail.com> wrote:
>> >>
>> >> Hi Yanzhu,
>> >>
>> >> On Aug 20, 2014 3:59 PM, "Yanzhu Lin" <mlinyzh at gmail.com> wrote:
>> >> >
>> >> > Hi Mike,
>> >> >
>> >> > I am using DESeq2 package for my project now, and I have some
>> questions regarding to this pacakge.
>> >> >
>> >> > Please let me briefly introduce some background information about my
>> project. I have three factors: A with 16 levels, B with 2 levels and C with
>> 3 levels, in total 16*2*3=96 groups. There are 8 biosamples for each group,
>> hence 96*8=768 biosamples in total. Due to some issues, we lost some
>> replicates for some groups, which ends up 726 biosamples, hence it is
>> unbalance design.
>> >> >
>> >> > Our purpose are to test the main effects of three factor: A, B and
>> C, the two-way interaction: A:B, A:C and B:C, and the three-way interaction
>> term: A:B:C. For example, I will compare full model: ~A+B+C with reduce
>> model: ~A+B to test factor C, and so on for other two main effects. For
>> testing three-way interaction A:B:C, I will compare full model:
>> ~A+B+C+A:B+A:C+B:C+A:B:C and reduced model ~A+B+C+A:B+A:C+B:C. Then come my
>> questions.
>> >> >
>> >> >
>> >> > I have different full models for testing main effects, two-way
>> interaction and three-way interaction term. Will the dispersion estimation
>> affected by my full model? Can I specify the full model when I use
>> nbinomLRT()?
>> >>
>> >> No, you should not change the design in between, i.e. don't use a
>> different design for dispersion estimation and the full model in the LRT.
>> >>
>> >> Mike
>> >>
>> >> > In other words, can I use estimateDispersion() only once and fit
>> nbinomLRT with different full models as below:
>> >> >
>> >> >
>> >> >
>> dds<-DESeqDataSetFromMatrix(countData=countdata,colData=coldata,design=~A+B+C+A:B+A:C+B:C+A:B:C)
>> >> >
>> >> > ### normalization
>> >> > dds=estimateSizeFactors(dds)
>> >> >
>> >> > ### dispersion estimation:
>> >> > dds=estimateDispersions(dds)
>> >> >
>> >> > ###Test three-way interaction term.
>> >> > dds<-nbinomLRT(dds,reduced=~A+B+C+A:B+A:C+B:C)
>> >> > ###Test main effect of factor A:
>> >> > dds<-nbinomLRT(dds,full=~A+B+C, reduced=~B+C)
>> >> > ###Test main effect of factor B:
>> >> > dds<-nbinomLRT(dds,full=~A+B+C, reduced=~A+C)
>> >> >
>> >> > ###Test main effect of factor C:
>> >> > dds<-nbinomLRT(dds,full=~A+B+C, reduced=~A+B)
>> >> >
>> >> >
>> >> > Thanks,
>> >> >
>> >> >
>> >> > Yanzhu
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On Tue, Jun 10, 2014 at 4:07 PM, Michael Love <
>> michaelisaiahlove at gmail.com> wrote:
>> >> >>
>> >> >> hi Yanzhu,
>> >> >>
>> >> >> Note that we recommend users switch to using DESeq2, which also has
>> >> >> the likelihood ratio test you are using, and is faster and more
>> >> >> sensitive.
>> >> >>
>> >> >> The pipeline would look like:
>> >> >>
>> >> >> DESeq(dds, test="LRT", reduced=~ A+B+C+A:B+A:C+B:C)
>> >> >>
>> >> >> for your first example.
>> >> >>
>> >> >> For your question, the terms of the reduced model should be
>> contained
>> >> >> within the full model. Still there are a number of models which
>> >> >> satisfy this requirement, e.g. for testing B:C, you could use
>> >> >> A+B+C+A:B+A:C+B:C and A+B+C+A:B+A:C as full and reduced
>> respectively.
>> >> >> Or you could use A+B+C+B:C and A+B+C. The importance of these other
>> >> >> interaction terms depends on context, whether they are very
>> >> >> explanatory or not.
>> >> >>
>> >> >> Mike
>> >> >>
>> >> >> On Tue, Jun 10, 2014 at 11:21 AM, yanzhu [guest] <
>> guest at bioconductor.org> wrote:
>> >> >> > Dear Community,
>> >> >> >
>> >> >> > I have a question about the hypothesis testing of the two-way
>> interaction terms in a multifactor design which includes three factors: A,
>> B and C.
>> >> >> >
>> >> >> > When I tested the three-way interaction I used the full and
>> reduced models as below for nbinomGLMTest():
>> >> >> > Full: count ~ A+B+C+A:B+A:C+B:C+A:B:C
>> >> >> > Reduced: count ~ A+B+C+A:B+A:C+B:C
>> >> >> >
>> >> >> > Now comes my question, when I want to test the effect of two-way
>> interaction terms, i.e., A:B, A:C or B:C, what should be my full and
>> reduced models? For example, when I want to the test the effect of A:B,
>> what should be my full and reduced models for nbinomGLMTest() using DESeq
>> pacakge?
>> >> >> >
>> >> >> >
>> >> >> > Best,
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > Yanzhu
>> >> >> >
>> >> >> >
>> >> >> >  -- output of sessionInfo():
>> >> >> >
>> >> >> > sessionInfo()
>> >> >> > R version 3.1.0 (2014-04-10)
>> >> >> > Platform: x86_64-w64-mingw32/x64 (64-bit)
>> >> >> >
>> >> >> > locale:
>> >> >> > [1] LC_COLLATE=English_United States.1252
>> LC_CTYPE=English_United States.1252    LC_MONETARY=English_United
>> States.1252
>> >> >> > [4] LC_NUMERIC=C                           LC_TIME=English_United
>> States.1252
>> >> >> >
>> >> >> > attached base packages:
>> >> >> > [1] parallel  stats     graphics  grDevices utils     datasets
>> methods   base
>> >> >> >
>> >> >> > other attached packages:
>> >> >> > [1] DESeq_1.16.0        lattice_0.20-29     locfit_1.5-9.1
>> Biobase_2.24.0      BiocGenerics_0.10.0 edgeR_3.6.1         limma_3.20.1
>> >> >> >
>> >> >> > loaded via a namespace (and not attached):
>> >> >> >  [1] annotate_1.42.0      AnnotationDbi_1.26.0 DBI_0.2-7
>>   genefilter_1.46.0    geneplotter_1.42.0   GenomeInfoDb_1.0.2
>> >> >> >  [7] grid_3.1.0           IRanges_1.22.6       MASS_7.3-31
>>   RColorBrewer_1.0-5   RSQLite_0.11.4       splines_3.1.0
>> >> >> > [13] stats4_3.1.0         survival_2.37-7      tools_3.1.0
>>   XML_3.98-1.1         xtable_1.7-3
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Sent via the guest posting facility at bioconductor.org.
>> >> >> >
>> >> >> > _______________________________________________
>> >> >> > Bioconductor mailing list
>> >> >> > Bioconductor at r-project.org
>> >> >> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >> >> > Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >> >
>> >> >
>> >
>> >
>>
>
>

	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list