[BioC] DESeq2: likelihood ratio test

Tue Aug 26 17:48:19 CEST 2014

Thank you, Mike.

I have another question.  I remember you mentioned that LRT results is similar to the union of contrast Wald tests.  I have 80 genes significant by LRT, and 145 genes significant when combining all 3 comparisons. Are these numbers reasonable?

dpsc.dds <- DESeqDataSetFromMatrix(countData = dpsc.counts.data,
  colData = dpsc.mapping.data,
  design = ~ Gender+Disease)

dpsc.dds.LRT <- DESeq(dpsc.dds,betaPrior=FALSE, test="LRT", full=~Gender+Disease, reduced=~Gender)
dpsc.res.LRT <- results(dpsc.dds.LRT)

dpsc.res <- results(dpsc.dds)
dpsc.res$padj.15qvsCon <- results(dpsc.dds, contrast=c("Disease","15q.Duplication", "Control"))$padj
dpsc.res$padj.15qvsAS <- results(dpsc.dds, contrast=c("Disease","15q.Duplication", "AS.Deletion"))$padj

My session info()

## R version 3.1.0 (2014-04-10)
## Platform: x86_64-apple-darwin10.8.0 (64-bit)
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods
## [8] base
##
## other attached packages:
## [1] DESeq2_1.4.5            RcppArmadillo_0.4.320.0 Rcpp_0.11.2
## [4] GenomicRanges_1.16.3    GenomeInfoDb_1.0.2      IRanges_1.22.10
## [7] BiocGenerics_0.10.0
##
## loaded via a namespace (and not attached):
##  [1] annotate_1.42.1      AnnotationDbi_1.26.0 Biobase_2.24.0
##  [4] DBI_0.2-7            digest_0.6.4         evaluate_0.5.5
##  [7] formatR_0.10         genefilter_1.46.1    geneplotter_1.42.0
## [10] grid_3.1.0           htmltools_0.2.4      knitr_1.6
## [13] lattice_0.20-29      locfit_1.5-9.1       RColorBrewer_1.0-5
## [16] rmarkdown_0.2.49     RSQLite_0.11.4       splines_3.1.0
## [19] stats4_3.1.0         stringr_0.6.2        survival_2.37-7
## [22] tools_3.1.0          XML_3.98-1.1         xtable_1.7-3
## [25] XVector_0.4.0

___________________________
Nhu Quynh T. Tran, PhD
Assistant Professor of Preventive Medicine
University of Tennessee Health Science Center
66 N. Pauline, Suite 633
Memphis, TN 38015
Phone: 901-448-1361
qtran1 at uthsc.edu<mailto:qtran1 at uthsc.edu>

On Aug 25, 2014, at 12:07 PM, Michael Love wrote:

hi Quynh,

It's helpful to include the code and output of sessionInfo() for more
detailed answers.

more answers inline:

On Mon, Aug 25, 2014 at 8:24 AM, Quynh Tran <qtran1 at memphis.edu<mailto:qtran1 at memphis.edu>> wrote:
Hi,

I just want to make sure that I understand the LRT correctly. My purpose is to test if Gender is a confounder in gene expressions across diseases (control, disease1, disease2). I have this code:

neu.dds <- DESeqDataSetFromMatrix(countData = neuron.counts.data,
 colData = neuron.mapping.data,
 design = ~ Gender+Disease)
neu.dds.LRT <- DESeq(neu.dds,betaPrior=FALSE, test="LRT", full=~Gender+Disease, reduced=~Gender)

The LRT test the full vs reduced model.  So, the null model is reduce and the alternative model is full.  When a gene has a p-value < 0.05, we conclude the gene expression changes at some disease in the presence of gender, while others only affected by Gender only???

Yes, except the null includes genes affected by gender or not at all.

So we typically say "testing for the effect of disease, controlling for gender"

Also, since my disease has 3 levels, I noticed that the p-values for disease2 vs control are the same as for disease1 vs control for the LRT, but not the same for the disease2 vs disease1. Specifically, I have 80 genes with padj< 0.05 for D1 vs Con and D2 vs Con, but have 139 genes with padj <0.05 for D2 vs D1.  Why is this the case?

In the help page for ?results, we have:

"For analyses using the likelihood ratio test (using nbinomLRT), the
p-values are determined solely by the difference in deviance between
the full and reduced model formula. A log2 fold change is included,
which can be controlled using the name argument, or by default this
will be the estimated coefficient for the last element of
resultsNames(object)."

It should also specify this  information when you show the results
object if you are using version >= 1.4, e.g.:

log2 fold change: condition 2 vs 1
LRT p-value: '~ condition' vs '~ 1'
...

If you want to test comparisons of pairs of levels of disease, you
should be using the Wald test (the standard DESeq() workflow), instead
of the LRT. The LRT is for testing all levels of disease at once.

When you use the LRT, using the 'name' argument only changes the
columns of the results which have to do with the LFC's. For the
current release of DESeq2 (v1.4), when you use the 'contrast'
argument, the p-values are replaced with Wald test p-values. In the
development branch, I am working on having more specific control of
whether or not to replace the LRT p-values with Wald p-values. For
now, you can build whatever results table you like by combining the
LFC columns for all contrasts with the LRT statistic, p-values and
adjusted p-value columns.

Mike

Thanks,
Quynh
_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org<mailto:Bioconductor at r-project.org>
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org<mailto:Bioconductor at r-project.org>
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

	[[alternative HTML version deleted]]