[BioC] DESeq2: likelihood ratio test

Mon Aug 25 19:07:06 CEST 2014

hi Quynh,

It's helpful to include the code and output of sessionInfo() for more
detailed answers.

more answers inline:

On Mon, Aug 25, 2014 at 8:24 AM, Quynh Tran <qtran1 at memphis.edu> wrote:
> Hi,
>
> I just want to make sure that I understand the LRT correctly. My purpose is to test if Gender is a confounder in gene expressions across diseases (control, disease1, disease2). I have this code:
>
> neu.dds <- DESeqDataSetFromMatrix(countData = neuron.counts.data,
>   colData = neuron.mapping.data,
>   design = ~ Gender+Disease)
> neu.dds.LRT <- DESeq(neu.dds,betaPrior=FALSE, test="LRT", full=~Gender+Disease, reduced=~Gender)
>
> The LRT test the full vs reduced model.  So, the null model is reduce and the alternative model is full.  When a gene has a p-value < 0.05, we conclude the gene expression changes at some disease in the presence of gender, while others only affected by Gender only???
>

Yes, except the null includes genes affected by gender or not at all.

So we typically say "testing for the effect of disease, controlling for gender"

> Also, since my disease has 3 levels, I noticed that the p-values for disease2 vs control are the same as for disease1 vs control for the LRT, but not the same for the disease2 vs disease1. Specifically, I have 80 genes with padj< 0.05 for D1 vs Con and D2 vs Con, but have 139 genes with padj <0.05 for D2 vs D1.  Why is this the case?
>

In the help page for ?results, we have:

"For analyses using the likelihood ratio test (using nbinomLRT), the
p-values are determined solely by the difference in deviance between
the full and reduced model formula. A log2 fold change is included,
which can be controlled using the name argument, or by default this
will be the estimated coefficient for the last element of
resultsNames(object)."

It should also specify this  information when you show the results
object if you are using version >= 1.4, e.g.:

log2 fold change: condition 2 vs 1
LRT p-value: '~ condition' vs '~ 1'
...

If you want to test comparisons of pairs of levels of disease, you
should be using the Wald test (the standard DESeq() workflow), instead
of the LRT. The LRT is for testing all levels of disease at once.

When you use the LRT, using the 'name' argument only changes the
columns of the results which have to do with the LFC's. For the
current release of DESeq2 (v1.4), when you use the 'contrast'
argument, the p-values are replaced with Wald test p-values. In the
development branch, I am working on having more specific control of
whether or not to replace the LRT p-values with Wald p-values. For
now, you can build whatever results table you like by combining the
LFC columns for all contrasts with the LRT statistic, p-values and
adjusted p-value columns.

Mike

> Thanks,
> Quynh
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor