[BioC] DEXSeq update results change

Sat Aug 16 12:24:42 CEST 2014

Dear Antonio

can you send one ore more specific examples, i.e. 
- the count table for the affected gene(s), for all its exons, and/or the plotDEXSeq output
- the size factorss

This should help all of us understand better, and perhaps fix, what you’re unhappy about.
What DEXSeq does is not a black box, it is in fact very simple, so we should be able to get to the bottom of this.

Regarding the question in the second paragraph: if you have reason to assume that the biological variability is the same in all your conditions (knockdowns), then the joint dispersion estimation will be more precise. But it is not biologically implausible that the assumption may be wrong (e.g. because of the different efficiency of RNAi), leading to underestimating of the true biological variability (and there over-calling of results) in some conditions.

That also leads to the second point - with only two replicates per condition, expectations about reproducibility of the result should be modest. No amount of statistical software can undo that.

Best wishes
	Wolfgang

Il giorno 15 Aug 2014, alle ore 16:25, António domingues <amjdomingues at gmail.com> ha scritto:

> Dear Alejandro,
> 
> I just wanted to follow up on this to say that I also see quite a big difference between the "new" and old "DEXseq". Not only are the numbers of differentially expressed exons much larger in the new version (in one experiment they nearly quadrupled), the direction of change is now shifted. That is, when upon knock-down there was about 50% more exon exclusion then inclusion, now is the other way around. It does not happen in all my knockdowns (and I have seven of them) but it is sufficient to me wary of previous conclusions based on the old version. As before, DEXSeq was run with the default options. Perhaps my experimental design is not the best to make a conclusion on how much different the results are between the 2 versions of DEXSeq (only 2 biological replicates per condition), but other users should bare in mind that some changes in results might happen.
> 
> Regarding my experimental design, I am building the DEXSeqDataSet object with only 2 conditions (4 samples)  to do the pairwise comparisons. Since I have a control and 7 conditions, is it possible, similarly to DESeq2, to build the object, estimate the dispersion, and do the comparisons with all the samples  and then only extract the results of the comparisons of interest? And if so, does it offer an statistical advantage? My gut feeling says yes but it says many wrong things all time :) (I am attaching a dispersion plot from on comparison for DEXSeq 1.10, sessionInfo is at the bottom of the email)
> 
> On a matter of packages changes, and I put this question to discussion on the list, where should the threshold be for a change in a package to warrant also a change in name? Changes in function wrappers, bug corrections are all fine, but when the results stop being reproducible (and not due to bug fixing), should it be time to think about it? We have seen it happening with DESeq which after major changes became DESeq2. This is not a dig at you, just genuine curiosity, and concern as user.
> 
> 
> Best,
> António
> 
>> Dear Marco Marconi,
>> 
>> I think that was the version where we changed from our original method,
>> the one described on the paper to the recent apporach, you fill find
>> this details in the section "Methodological changes since publication of
>> the paper".  As you might have noticed, the dispersions are very
>> correlated as well as the p-values.
>> 
>> I don't think the change in the p-value, and therefore the p-adjusted
>> value, since it is not changing dramatically.  The simplest thing would
>> be to increase your FDR threshold a bit.
>> 
>> Best regards,
>> Alejandro
>> 
>> >/  Hello, After performing a general Bioconductor update to the new version, I
>> />/  noticed that now the DEXseq package 1.8.0 is giving me different results
>> />/  from prrevious version 1.6.0. As a start, its function print dots "..." on
>> />/  the stdout which was not done in the previous version. This is not a big
>> />/  issue, the problem is that now I am obtaining different results. Generally,
>> />/  the padjust values are bigger.
>> />/
>> />/  For example this exon:
>> />/
>> />/                       a1       a2      a3       b1      b2       b3
>> />/  EXXXX        126     90      101     81      233     225
>> />/
>> />/  gets different results:
>> />/
>> />/  geneID,exonID,dispersion,pvalue,padjust,meanBase,log2fold(b/a)
>> />/
>> />/  old version:
>> />/  EXXXX,0.0684906370633231,0.00256847378387803,0.0321347815544768,129.941383199307,-0.217272839643456
>> />/
>> />/  new version:
>> />/  EXXXX,0.0928452378435829,0.00401881761350959,0.0587521235795571,129.941383199307,-0.213275654796358
>> />/
>> />/
>> />/  as you can see the old one has a padjust below 0.05 and the other above
>> />/  0.05, which is a big problem.
>> />/
>> />/
>> />/  I had a look in the NEWS section of the DEXSeq package, but i couldn't find
>> />/  any information about major changes.
>> />/
>> />/
>> />/  thank you very much, regards,
>> />
> 
> 
> sessionInfo()
> R version 3.1.1 (2014-07-10)
> Platform: x86_64-pc-linux-gnu (64-bit)
> 
> locale:
> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
> [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8 LC_NAME=C                  LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets methods   base
> 
> other attached packages:
> [1] ggplot2_1.0.0           plyr_1.8.1 DEXSeq_1.10.8           BiocParallel_0.6.1 DESeq2_1.4.5            RcppArmadillo_0.4.320.0
> [7] Rcpp_0.11.2             GenomicRanges_1.16.2 GenomeInfoDb_1.0.2      IRanges_1.21.43 Biobase_2.24.0          BiocGenerics_0.10.0
> 
> loaded via a namespace (and not attached):
> [1] annotate_1.42.1      AnnotationDbi_1.26.0 BatchJobs_1.3        BBmisc_1.7           biomaRt_2.20.0 Biostrings_2.32.0
> [7] bitops_1.0-6         brew_1.0-6 checkmate_1.2        codetools_0.2-8      colorspace_1.2-4 DBI_0.2-7
> [13] digest_0.6.4         fail_1.2 foreach_1.4.2        genefilter_1.46.1    geneplotter_1.42.0 grid_3.1.1
> [19] gtable_0.1.2         hwriter_1.3 iterators_1.0.7      lattice_0.20-29      locfit_1.5-9.1 MASS_7.3-33
> [25] munsell_0.4.2        proto_0.3-10 RColorBrewer_1.0-5   RCurl_1.95-4.1       reshape2_1.4 Rsamtools_1.16.0
> [31] RSQLite_0.11.4       scales_0.2.4 sendmailR_1.1-2      splines_3.1.1        statmod_1.4.20 stats4_3.1.1
> [37] stringr_0.6.2        survival_2.37-7 tools_3.1.1          XML_3.98-1.1         xtable_1.7-3 XVector_0.4.0
> [43] zlibbioc_1.10.0
> 
> -- 
> António Miguel de Jesus Domingues, PhD
> Postdoctoral researcher
> Deep Sequencing Group - SFB655
> Biotechnology Center (Biotec)
> Technische Universität Dresden
> Fetscherstraße 105
> 01307 Dresden
> 
> Phone: +49 (351) 458 82362
> Email: antonio.domingues(at)biotec.tu-dresden.de
> --
> The Unbearable Lightness of Molecular Biology
> 
> 
> <3_c_fitDispersion.png>_______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor