[BioC] DESeq2: likelihood ratio test

Nhu Quynh T. Tran nhuquynh.t.tran at gmail.com
Tue Aug 26 17:51:01 CEST 2014


Thank you, Mike.

I have another question.  I remember you mentioned that LRT results is similar to the union of contrast Wald tests.  I have 80 genes significant by LRT, and 145 genes significant when combining all 3 comparisons. Are these numbers reasonable?

dpsc.dds <- DESeqDataSetFromMatrix(countData = dpsc.counts.data,
  colData = dpsc.mapping.data,
  design = ~ Gender+Disease)

dpsc.dds.LRT <- DESeq(dpsc.dds,betaPrior=FALSE, test="LRT", full=~Gender+Disease, reduced=~Gender)
dpsc.res.LRT <- results(dpsc.dds.LRT)

dpsc.res <- results(dpsc.dds)
dpsc.res$padj.15qvsCon <- results(dpsc.dds, contrast=c("Disease","15q.Duplication", "Control"))$padj
dpsc.res$padj.15qvsAS <- results(dpsc.dds, contrast=c("Disease","15q.Duplication", "AS.Deletion"))$padj

My session info()
## R version 3.1.0 (2014-04-10)
## Platform: x86_64-apple-darwin10.8.0 (64-bit)
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] DESeq2_1.4.5            RcppArmadillo_0.4.320.0 Rcpp_0.11.2            
## [4] GenomicRanges_1.16.3    GenomeInfoDb_1.0.2      IRanges_1.22.10        
## [7] BiocGenerics_0.10.0    
## 
## loaded via a namespace (and not attached):
##  [1] annotate_1.42.1      AnnotationDbi_1.26.0 Biobase_2.24.0      
##  [4] DBI_0.2-7            digest_0.6.4         evaluate_0.5.5      
##  [7] formatR_0.10         genefilter_1.46.1    geneplotter_1.42.0  
## [10] grid_3.1.0           htmltools_0.2.4      knitr_1.6           
## [13] lattice_0.20-29      locfit_1.5-9.1       RColorBrewer_1.0-5  
## [16] rmarkdown_0.2.49     RSQLite_0.11.4       splines_3.1.0       
## [19] stats4_3.1.0         stringr_0.6.2        survival_2.37-7     
## [22] tools_3.1.0          XML_3.98-1.1         xtable_1.7-3        
## [25] XVector_0.4.0

On Aug 25, 2014, at 12:07 PM, Michael Love wrote:

> hi Quynh,
> 
> It's helpful to include the code and output of sessionInfo() for more
> detailed answers.
> 
> more answers inline:
> 
> 
> 
> 
> On Mon, Aug 25, 2014 at 8:24 AM, Quynh Tran <qtran1 at memphis.edu> wrote:
>> Hi,
>> 
>> I just want to make sure that I understand the LRT correctly. My purpose is to test if Gender is a confounder in gene expressions across diseases (control, disease1, disease2). I have this code:
>> 
>> neu.dds <- DESeqDataSetFromMatrix(countData = neuron.counts.data,
>>  colData = neuron.mapping.data,
>>  design = ~ Gender+Disease)
>> neu.dds.LRT <- DESeq(neu.dds,betaPrior=FALSE, test="LRT", full=~Gender+Disease, reduced=~Gender)
>> 
>> The LRT test the full vs reduced model.  So, the null model is reduce and the alternative model is full.  When a gene has a p-value < 0.05, we conclude the gene expression changes at some disease in the presence of gender, while others only affected by Gender only???
>> 
> 
> Yes, except the null includes genes affected by gender or not at all.
> 
> So we typically say "testing for the effect of disease, controlling for gender"
> 
>> Also, since my disease has 3 levels, I noticed that the p-values for disease2 vs control are the same as for disease1 vs control for the LRT, but not the same for the disease2 vs disease1. Specifically, I have 80 genes with padj< 0.05 for D1 vs Con and D2 vs Con, but have 139 genes with padj <0.05 for D2 vs D1.  Why is this the case?
>> 
> 
> In the help page for ?results, we have:
> 
> "For analyses using the likelihood ratio test (using nbinomLRT), the
> p-values are determined solely by the difference in deviance between
> the full and reduced model formula. A log2 fold change is included,
> which can be controlled using the name argument, or by default this
> will be the estimated coefficient for the last element of
> resultsNames(object)."
> 
> It should also specify this  information when you show the results
> object if you are using version >= 1.4, e.g.:
> 
> log2 fold change: condition 2 vs 1
> LRT p-value: '~ condition' vs '~ 1'
> ...
> 
> If you want to test comparisons of pairs of levels of disease, you
> should be using the Wald test (the standard DESeq() workflow), instead
> of the LRT. The LRT is for testing all levels of disease at once.
> 
> When you use the LRT, using the 'name' argument only changes the
> columns of the results which have to do with the LFC's. For the
> current release of DESeq2 (v1.4), when you use the 'contrast'
> argument, the p-values are replaced with Wald test p-values. In the
> development branch, I am working on having more specific control of
> whether or not to replace the LRT p-values with Wald p-values. For
> now, you can build whatever results table you like by combining the
> LFC columns for all contrasts with the LRT statistic, p-values and
> adjusted p-value columns.
> 
> Mike
> 
>> Thanks,
>> Quynh
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list