[BioC] DESeq2: likelihood ratio test

Nhu Quynh T. Tran nhuquynh.t.tran at gmail.com
Tue Aug 26 18:46:29 CEST 2014


Thank you, Mike!

Quynh

On Aug 26, 2014, at 11:22 AM, Michael Love wrote:

> Hi Quynh,
> 
> On Aug 26, 2014 5:48 PM, "Tran, Nhu Quynh T" <qtran1 at uthsc.edu> wrote:
> >
> > Thank you, Mike.
> >
> > I have another question.  I remember you mentioned that LRT results is similar to the union of contrast Wald tests.
> 
> Not exactly.
> 
> While the LRT is a test of significance for differences of any level of the factor, I would not expect it to be exactly equal to the union of sets of genes using Wald tests.
> 
> Use the LRT if you want to describe the set of genes from any level of disease. Use the Wald tests if you want to describe the set of genes from pairs of levels.
> 
> > I have 80 genes significant by LRT, and 145 genes significant when combining all 3 comparisons. Are these numbers reasonable?
> >
> > dpsc.dds <- DESeqDataSetFromMatrix(countData = dpsc.counts.data,
> >   colData = dpsc.mapping.data,
> >   design = ~ Gender+Disease)
> >
> > dpsc.dds.LRT <- DESeq(dpsc.dds,betaPrior=FALSE, test="LRT", full=~Gender+Disease, reduced=~Gender)
> > dpsc.res.LRT <- results(dpsc.dds.LRT)
> >
> > dpsc.res <- results(dpsc.dds)
> > dpsc.res$padj.15qvsCon <- results(dpsc.dds, contrast=c("Disease","15q.Duplication", "Control"))$padj
> > dpsc.res$padj.15qvsAS <- results(dpsc.dds, contrast=c("Disease","15q.Duplication", "AS.Deletion"))$padj
> >
> > My session info()
> >
> > ## R version 3.1.0 (2014-04-10)
> > ## Platform: x86_64-apple-darwin10.8.0 (64-bit)
> > ## 
> > ## locale:
> > ## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> > ## 
> > ## attached base packages:
> > ## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
> > ## [8] base     
> > ## 
> > ## other attached packages:
> > ## [1] DESeq2_1.4.5            RcppArmadillo_0.4.320.0 Rcpp_0.11.2            
> > ## [4] GenomicRanges_1.16.3    GenomeInfoDb_1.0.2      IRanges_1.22.10        
> > ## [7] BiocGenerics_0.10.0    
> > ## 
> > ## loaded via a namespace (and not attached):
> > ##  [1] annotate_1.42.1      AnnotationDbi_1.26.0 Biobase_2.24.0      
> > ##  [4] DBI_0.2-7            digest_0.6.4         evaluate_0.5.5      
> > ##  [7] formatR_0.10         genefilter_1.46.1    geneplotter_1.42.0  
> > ## [10] grid_3.1.0           htmltools_0.2.4      knitr_1.6           
> > ## [13] lattice_0.20-29      locfit_1.5-9.1       RColorBrewer_1.0-5  
> > ## [16] rmarkdown_0.2.49     RSQLite_0.11.4       splines_3.1.0       
> > ## [19] stats4_3.1.0         stringr_0.6.2        survival_2.37-7     
> > ## [22] tools_3.1.0          XML_3.98-1.1         xtable_1.7-3        
> > ## [25] XVector_0.4.0
> >
> >
> >
> > ___________________________
> > Nhu Quynh T. Tran, PhD
> > Assistant Professor of Preventive Medicine
> > University of Tennessee Health Science Center
> > 66 N. Pauline, Suite 633
> > Memphis, TN 38015
> > Phone: 901-448-1361
> > qtran1 at uthsc.edu
> >
> >
> >
> >
> > On Aug 25, 2014, at 12:07 PM, Michael Love wrote:
> >
> >> hi Quynh,
> >>
> >> It's helpful to include the code and output of sessionInfo() for more
> >> detailed answers.
> >>
> >> more answers inline:
> >>
> >>
> >>
> >>
> >> On Mon, Aug 25, 2014 at 8:24 AM, Quynh Tran <qtran1 at memphis.edu> wrote:
> >>>
> >>> Hi,
> >>>
> >>>
> >>> I just want to make sure that I understand the LRT correctly. My purpose is to test if Gender is a confounder in gene expressions across diseases (control, disease1, disease2). I have this code:
> >>>
> >>>
> >>> neu.dds <- DESeqDataSetFromMatrix(countData = neuron.counts.data,
> >>>
> >>>  colData = neuron.mapping.data,
> >>>
> >>>  design = ~ Gender+Disease)
> >>>
> >>> neu.dds.LRT <- DESeq(neu.dds,betaPrior=FALSE, test="LRT", full=~Gender+Disease, reduced=~Gender)
> >>>
> >>>
> >>> The LRT test the full vs reduced model.  So, the null model is reduce and the alternative model is full.  When a gene has a p-value < 0.05, we conclude the gene expression changes at some disease in the presence of gender, while others only affected by Gender only???
> >>>
> >>>
> >>
> >> Yes, except the null includes genes affected by gender or not at all.
> >>
> >> So we typically say "testing for the effect of disease, controlling for gender"
> >>
> >>> Also, since my disease has 3 levels, I noticed that the p-values for disease2 vs control are the same as for disease1 vs control for the LRT, but not the same for the disease2 vs disease1. Specifically, I have 80 genes with padj< 0.05 for D1 vs Con and D2 vs Con, but have 139 genes with padj <0.05 for D2 vs D1.  Why is this the case?
> >>>
> >>>
> >>
> >> In the help page for ?results, we have:
> >>
> >> "For analyses using the likelihood ratio test (using nbinomLRT), the
> >> p-values are determined solely by the difference in deviance between
> >> the full and reduced model formula. A log2 fold change is included,
> >> which can be controlled using the name argument, or by default this
> >> will be the estimated coefficient for the last element of
> >> resultsNames(object)."
> >>
> >> It should also specify this  information when you show the results
> >> object if you are using version >= 1.4, e.g.:
> >>
> >> log2 fold change: condition 2 vs 1
> >> LRT p-value: '~ condition' vs '~ 1'
> >> ...
> >>
> >> If you want to test comparisons of pairs of levels of disease, you
> >> should be using the Wald test (the standard DESeq() workflow), instead
> >> of the LRT. The LRT is for testing all levels of disease at once.
> >>
> >> When you use the LRT, using the 'name' argument only changes the
> >> columns of the results which have to do with the LFC's. For the
> >> current release of DESeq2 (v1.4), when you use the 'contrast'
> >> argument, the p-values are replaced with Wald test p-values. In the
> >> development branch, I am working on having more specific control of
> >> whether or not to replace the LRT p-values with Wald p-values. For
> >> now, you can build whatever results table you like by combining the
> >> LFC columns for all contrasts with the LRT statistic, p-values and
> >> adjusted p-value columns.
> >>
> >> Mike
> >>
> >>> Thanks,
> >>>
> >>> Quynh
> >>>
> >>> _______________________________________________
> >>>
> >>> Bioconductor mailing list
> >>>
> >>> Bioconductor at r-project.org
> >>>
> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >>>
> >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>
> >>
> >> _______________________________________________
> >> Bioconductor mailing list
> >> Bioconductor at r-project.org
> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> >


	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list