[BioC] Conservative results using DEXSeq

Wolfgang Huber whuber at embl.de
Sat Jul 27 00:19:05 CEST 2013


Dear Levi

thanks, you are right, batch effects can lead to excessive within-group vs between-group variation and thus p-value distributions that are more concentrated towards 1 than uniform. Such an effect could play a role in addition to the one that Simon described.

In Gu's case, further diagnostics are needed to disentangle and potentially fix the problem.
	
	Best wishes
	Wolfgang

	
On 24 Jul 2013, at 17:06, Levi Waldron <lwaldron.research at gmail.com> wrote:

> I have noticed the kind of p-value histograms that Gu describes in other
> situations also, even using the same technologies and bioinformatic methods
> as other situations where it doesn't occur.  I am not sure why it happened,
> but it could have to do with a batch effect that is *not* confounded with
> the outcome variable?
> 
> As an example I'm attaching raw p-value histograms of Cox regressions for
> each of 14 ovarian cancer datasets, code below.  At least one of these has
> the monotonic increase described.  This experiment used the same microarray
> platform as many of the other datasets (Affy hgu133plus2), but is the only
> experiment using microdissected tissues.  Point is just that the effect
> could be magnified some reason relating to the experiment.
> 
> library(survival)
> library(affy)
> library(curatedOvarianData)
> if( !require("survHD") || packageVersion("survHD") != "0.99.1" ){
>    library(devtools)
>    install_url("
> https://bitbucket.org/lwaldron/survhd/downloads/survHD_0.99.1.tar.gz")
> }
> 
> 
> source(system.file("extdata",
> "patientselection.config",package="curatedOvarianData"))
> source(system.file("extdata", "createEsetList.R", package =
> "curatedOvarianData"))
> 
> pvals <- lapply(esets, function(eset) rowCoxTests(exprs(eset), eset$y)[, 3])
> 
> png("Cox_p-values.png")
> par(mfrow=c(4, 4))
> for (i in 1:length(pvals))
>    hist(pvals[[i]], main=names(pvals)[i], xlab="raw p-value")
> dev.off()
> 
> 
> 
> On Wed, Jul 24, 2013 at 3:55 AM, Simon Anders <anders at embl.de> wrote:
> 
>> Hi
>> 
>> 
>> On 23/07/13 14:47, Gu [guest] wrote:
>> 
>>> By checking the histogram of raw p-values of exons (NOT genes), I
>>> find that it is monotonically increasing from 0 to 1, with relatively
>>> few counting bins falling into the bins from 0 to 0.2.
>>> 
>> 
>> You are right, DEXSeq sometimes tends to be overly conservative, which
>> then results in a skewed p value histogram as you describe it. Usually, it
>> is, however, only a rather slight skew, and it seems that the performance
>> is unusually bad for your specific dataset.
>> 
>> The main reason for the conservative results is the way we estimate
>> dispersion. Since the release of DEXSeq, we have made quite some progress
>> in improving the dispersion estimation by now using an empirical-Bayes
>> shrinkage estimator, and DESeq2 now offers a much better solution, at least
>> for gene-level tests. We are working on applying the same changes to
>> DEXSeq, and this should solve your issue. I'm afraid, however, that I have
>> to ask you for some patience until we are finished with these changes.
>> 
>>  Simon
>> 
>> 
>> ______________________________**_________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor>
>> Search the archives: http://news.gmane.org/gmane.**
>> science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor>
>> 
> <Cox_p-values.png>_______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list