[BioC] DESeq: Hypothesis testing in multifactor design

Steve Lianoglou lianoglou.steve at gene.com
Fri Aug 22 20:56:59 CEST 2014


Also, as Michael previously suggested you might want to try kicking
the tires on your data + analysis using limma::voom, especially for
data of this size.

Yes there are cases when one approach is better than the other, but
here the cal to `voom` will be relatively quick (my bet is that it's
*at most* 5 mins, but likely less than 1 ;-) and you can start
analyzing your data quickly.

Certainly try upgrading your R + bioconductor to use the latest
DESeq2, if you like, but you can get your analysis going w/
limma::voom while DESeq2 is estimating its dispersions, you could then
come around and see where the corner cases of your analysis are
between the results of the voom analysis vs. deseq2.

You could then report back here and tell us what you find as far what
the pros and cons of each approach are after you compare the results,
as these comparisons are always helpful/interesting for the greater
community.

HTH,
-steve


On Fri, Aug 22, 2014 at 7:23 AM, Michael Love
<michaelisaiahlove at gmail.com> wrote:
> Hi Yanzhu,
>
> I have already restructured this part of the code in the development
> branch, which will be released in  October. As I haven't seen this error
> before, i guess it has to do with the ~100 of coefficients and 700 samples,
> so I probably won't work on a fix for the release branch.
>
> You can either try the development branch of Bioconductor, or wait until
> the next release.
>
> Hope this helps,
>
> Mike
> On Aug 22, 2014 1:59 PM, "Yanzhu Lin" <mlinyzh at gmail.com> wrote:
>
>> Hi Mike,
>>
>> It took quite a long long time to estimate the dispersion using
>> estimateDispersions(), and it also reported an error message as below:
>>
>>
>>
>> dds<-DESeqDataSetFromMatrix(countData=countdata,colData=coldata,design=~A+B+C+A:B+A:C+B:C+A:B:C)
>> dds=estimateSizeFactors(dds)
>> dds=estimateDispersions(dds)
>>
>>
>>
>>
>> *gene-wise dispersion estimatesError in simpleLoess(y, x, w, span, degree,
>> parametric, drop.square, normalize,  :   NA/NaN/Inf in foreign function
>> call (arg 1)*
>> Do you have any suggestion about how to solve this problem? Thanks.
>>
>>
>> Best,
>>
>>
>> Yanzhu
>>
>> On Wed, Aug 20, 2014 at 3:54 PM, Michael Love <michaelisaiahlove at gmail.com
>> > wrote:
>>
>>> Hi Yanzhu,
>>>
>>> On Aug 20, 2014 7:39 PM, "Yanzhu Lin" <mlinyzh at gmail.com> wrote:
>>> >
>>> > Hi Mike,
>>> >
>>> > So you mean I need to estimate the dispersions every time when I have a
>>> different full model for LRT?
>>>
>>> Yes.
>>>
>>> > In other words, so I need to use estimateDispersions() once when I test
>>> the main effect, where the full model is ~A+B+C, and use
>>> estimateDispersion() once when I test the three-way interaction term, where
>>> the full model is ~A+B+C+A:B+A:C+B:C+A:B:C?
>>> >
>>> >
>>> > One more question about estimateDispersion(), does it take a very long
>>> time to estimate the dispersions? I have 16649 features and 726 biosamples,
>>> and I run the estimateDispersion function around 9:30 am this morning, and
>>> it hasn't done yet. Any suggestion that I can speed up the
>>> estimateDispesions().
>>> >
>>>
>>> Yes, it takes time to estimate dispersion and to fit the GLM when you
>>> have so many samples and when you have ~100 coefficients to fit with all
>>> the interactions you are including.
>>>
>>> You could also try the voom transformation and linear modeling with limma
>>>
>>> Mike
>>>
>>> > Your help will be greatly appreciate, thanks.
>>> >
>>> >
>>> > Best,
>>> >
>>> >
>>> >
>>> > On Wed, Aug 20, 2014 at 12:44 PM, Michael Love <
>>> michaelisaiahlove at gmail.com> wrote:
>>> >>
>>> >> Hi Yanzhu,
>>> >>
>>> >> On Aug 20, 2014 3:59 PM, "Yanzhu Lin" <mlinyzh at gmail.com> wrote:
>>> >> >
>>> >> > Hi Mike,
>>> >> >
>>> >> > I am using DESeq2 package for my project now, and I have some
>>> questions regarding to this pacakge.
>>> >> >
>>> >> > Please let me briefly introduce some background information about my
>>> project. I have three factors: A with 16 levels, B with 2 levels and C with
>>> 3 levels, in total 16*2*3=96 groups. There are 8 biosamples for each group,
>>> hence 96*8=768 biosamples in total. Due to some issues, we lost some
>>> replicates for some groups, which ends up 726 biosamples, hence it is
>>> unbalance design.
>>> >> >
>>> >> > Our purpose are to test the main effects of three factor: A, B and
>>> C, the two-way interaction: A:B, A:C and B:C, and the three-way interaction
>>> term: A:B:C. For example, I will compare full model: ~A+B+C with reduce
>>> model: ~A+B to test factor C, and so on for other two main effects. For
>>> testing three-way interaction A:B:C, I will compare full model:
>>> ~A+B+C+A:B+A:C+B:C+A:B:C and reduced model ~A+B+C+A:B+A:C+B:C. Then come my
>>> questions.
>>> >> >
>>> >> >
>>> >> > I have different full models for testing main effects, two-way
>>> interaction and three-way interaction term. Will the dispersion estimation
>>> affected by my full model? Can I specify the full model when I use
>>> nbinomLRT()?
>>> >>
>>> >> No, you should not change the design in between, i.e. don't use a
>>> different design for dispersion estimation and the full model in the LRT.
>>> >>
>>> >> Mike
>>> >>
>>> >> > In other words, can I use estimateDispersion() only once and fit
>>> nbinomLRT with different full models as below:
>>> >> >
>>> >> >
>>> >> >
>>> dds<-DESeqDataSetFromMatrix(countData=countdata,colData=coldata,design=~A+B+C+A:B+A:C+B:C+A:B:C)
>>> >> >
>>> >> > ### normalization
>>> >> > dds=estimateSizeFactors(dds)
>>> >> >
>>> >> > ### dispersion estimation:
>>> >> > dds=estimateDispersions(dds)
>>> >> >
>>> >> > ###Test three-way interaction term.
>>> >> > dds<-nbinomLRT(dds,reduced=~A+B+C+A:B+A:C+B:C)
>>> >> > ###Test main effect of factor A:
>>> >> > dds<-nbinomLRT(dds,full=~A+B+C, reduced=~B+C)
>>> >> > ###Test main effect of factor B:
>>> >> > dds<-nbinomLRT(dds,full=~A+B+C, reduced=~A+C)
>>> >> >
>>> >> > ###Test main effect of factor C:
>>> >> > dds<-nbinomLRT(dds,full=~A+B+C, reduced=~A+B)
>>> >> >
>>> >> >
>>> >> > Thanks,
>>> >> >
>>> >> >
>>> >> > Yanzhu
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Tue, Jun 10, 2014 at 4:07 PM, Michael Love <
>>> michaelisaiahlove at gmail.com> wrote:
>>> >> >>
>>> >> >> hi Yanzhu,
>>> >> >>
>>> >> >> Note that we recommend users switch to using DESeq2, which also has
>>> >> >> the likelihood ratio test you are using, and is faster and more
>>> >> >> sensitive.
>>> >> >>
>>> >> >> The pipeline would look like:
>>> >> >>
>>> >> >> DESeq(dds, test="LRT", reduced=~ A+B+C+A:B+A:C+B:C)
>>> >> >>
>>> >> >> for your first example.
>>> >> >>
>>> >> >> For your question, the terms of the reduced model should be
>>> contained
>>> >> >> within the full model. Still there are a number of models which
>>> >> >> satisfy this requirement, e.g. for testing B:C, you could use
>>> >> >> A+B+C+A:B+A:C+B:C and A+B+C+A:B+A:C as full and reduced
>>> respectively.
>>> >> >> Or you could use A+B+C+B:C and A+B+C. The importance of these other
>>> >> >> interaction terms depends on context, whether they are very
>>> >> >> explanatory or not.
>>> >> >>
>>> >> >> Mike
>>> >> >>
>>> >> >> On Tue, Jun 10, 2014 at 11:21 AM, yanzhu [guest] <
>>> guest at bioconductor.org> wrote:
>>> >> >> > Dear Community,
>>> >> >> >
>>> >> >> > I have a question about the hypothesis testing of the two-way
>>> interaction terms in a multifactor design which includes three factors: A,
>>> B and C.
>>> >> >> >
>>> >> >> > When I tested the three-way interaction I used the full and
>>> reduced models as below for nbinomGLMTest():
>>> >> >> > Full: count ~ A+B+C+A:B+A:C+B:C+A:B:C
>>> >> >> > Reduced: count ~ A+B+C+A:B+A:C+B:C
>>> >> >> >
>>> >> >> > Now comes my question, when I want to test the effect of two-way
>>> interaction terms, i.e., A:B, A:C or B:C, what should be my full and
>>> reduced models? For example, when I want to the test the effect of A:B,
>>> what should be my full and reduced models for nbinomGLMTest() using DESeq
>>> pacakge?
>>> >> >> >
>>> >> >> >
>>> >> >> > Best,
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > Yanzhu
>>> >> >> >
>>> >> >> >
>>> >> >> >  -- output of sessionInfo():
>>> >> >> >
>>> >> >> > sessionInfo()
>>> >> >> > R version 3.1.0 (2014-04-10)
>>> >> >> > Platform: x86_64-w64-mingw32/x64 (64-bit)
>>> >> >> >
>>> >> >> > locale:
>>> >> >> > [1] LC_COLLATE=English_United States.1252
>>> LC_CTYPE=English_United States.1252    LC_MONETARY=English_United
>>> States.1252
>>> >> >> > [4] LC_NUMERIC=C                           LC_TIME=English_United
>>> States.1252
>>> >> >> >
>>> >> >> > attached base packages:
>>> >> >> > [1] parallel  stats     graphics  grDevices utils     datasets
>>> methods   base
>>> >> >> >
>>> >> >> > other attached packages:
>>> >> >> > [1] DESeq_1.16.0        lattice_0.20-29     locfit_1.5-9.1
>>> Biobase_2.24.0      BiocGenerics_0.10.0 edgeR_3.6.1         limma_3.20.1
>>> >> >> >
>>> >> >> > loaded via a namespace (and not attached):
>>> >> >> >  [1] annotate_1.42.0      AnnotationDbi_1.26.0 DBI_0.2-7
>>>   genefilter_1.46.0    geneplotter_1.42.0   GenomeInfoDb_1.0.2
>>> >> >> >  [7] grid_3.1.0           IRanges_1.22.6       MASS_7.3-31
>>>   RColorBrewer_1.0-5   RSQLite_0.11.4       splines_3.1.0
>>> >> >> > [13] stats4_3.1.0         survival_2.37-7      tools_3.1.0
>>>   XML_3.98-1.1         xtable_1.7-3
>>> >> >> >
>>> >> >> >
>>> >> >> > --
>>> >> >> > Sent via the guest posting facility at bioconductor.org.
>>> >> >> >
>>> >> >> > _______________________________________________
>>> >> >> > Bioconductor mailing list
>>> >> >> > Bioconductor at r-project.org
>>> >> >> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> >> >> > Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> >> >
>>> >> >
>>> >
>>> >
>>>
>>
>>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



-- 
Steve Lianoglou
Computational Biologist
Genentech



More information about the Bioconductor mailing list