[BioC] DEXSeq update results change

Wed Aug 20 15:59:48 CEST 2014

Dear Antonio,

Thanks a lot for your explanations and sending your objects and code.

I had a look at your data, apparently the difference in dispersion 
estimates between the old and the new versions of DEXSeq can make a 
difference in the coefficients of the GLM, therefore the exon fold 
changes.  But this changes seem to be specifically affecting only those 
exons with very low counts.  For example, with the objects that you send me:

select <- rowSums( dxr$countData ) > 10
plot( dxr_new$`log2fold_3_c_GFP_c`[select], 
dxr_old$`log2fold.3_c_c.GFP_c_c.`[select] )

These numbers/plots give a much more reasonable picture. These 
differences are from those exons where noise is predominant. I will dig 
more into this, but I would not worry so much about it, the signs for 
the significant exons are anyway consistent:

select2 <- which(dxr_old$padjust < 0.1)
table( dxr_new$`log2fold_3_c_GFP_c`[select2] > 0 , 
dxr_old$`log2fold.3_c_c.GFP_c_c.`[select2] > 0)

       FALSE TRUE
FALSE  1630    0
TRUE      0  614

Best regards,
Alejandro

> Dear Wolfgang and Alejandro,
>
> First of all, thank you for looking into this.
>
>     can you send one ore more specific examples, i.e.
>     - the count table for the affected gene(s), for all its exons,
>     and/or the plotDEXSeq output
>     - the size factorss
>
>
> I have prepared a data set+script for testing that will follow in a 
> separate private email, so that you can look into this in detail. 
> While preparing it I think I spotted where the difference in results 
> might originate *(1)*.
>
> Let me clarify that my concern is not with a particular exon, but 
> rather with the general trend (ratio of up-regulated / down-regulated 
> exons) that is changed, particularly in the experimental set-up I am 
> sending you.
>
>     That also leads to the second point - with only two replicates per
>     condition, expectations about reproducibility of the result should
>     be modest. No amount of statistical software can undo that.
>
>
> I am well aware of that :) In defence of data, I should say that the 
> experimental validation of the DGE results (for this same data) was 
> nearly 100%. So yes, few replicates can be an issue, but we have some 
> experimental validation to give us assurance that not all is bad.
>
> @ Alejandro
>
>     Just an additional question, do you see the shift in fold changes
>     for all your exons or only for a subset of them?
>     In older versions there was a bug that was causing some label
>     swaps in the result columns, but this should be fixed in the most
>     recent versions (I just want to make sure it is fixed!). As
>     Wolfgang mentions, this would become evident by looking at the
>     plotDEXSeq output (by looking at the normalized counts and exon
>     usage).
>
>
>
> The scatter plot of fold change of new vs old version is a bit funky I 
> must say:
> https://www.dropbox.com/s/l3snr4epgwbkty8/foldchange_comparison.png
>
>
> *(1) *
> while playing with the example data to send you, I noticed what could 
> be an explanation while counting significantly changed exons:
>
> https://www.dropbox.com/s/7zc4n352ftjzqqe/nHits_comparison.pdf
>
> In the old version of DEXseq without a fold-change cut-off, there are 
> more exons with decreased inclusion than with increased inclusion 
> (~2500/1500 exons). With increasingly higher fold-change cut-offs this 
> is inverted. For instance with fc 10% is 2000/1500, and with  fc of 
> 50% is 80/400. So a completely different trend. Using the new DEXSeq 
> version, changing the FC cut-off makes no difference: the trend is 
> always more exons with increased inclusion, which is sort of what I 
> would expect.
>
> Could it be that the old version is less efficient in estimating the 
> fold-changes when the differences are minor. Well, not estimating 
> fold-changes but rather the dispersions. That would explain the 
> differences I observed. And we only have 2 replicates so we cannot 
> expect miracles from DEXSeq.
>
> Best regards,
> António
>
>
> On 16 August 2014 12:24, Wolfgang Huber <whuber at embl.de 
> <mailto:whuber at embl.de>> wrote:
>
>     Dear Antonio
>
>     can you send one ore more specific examples, i.e.
>     - the count table for the affected gene(s), for all its exons,
>     and/or the plotDEXSeq output
>     - the size factorss
>
>     This should help all of us understand better, and perhaps fix,
>     what you’re unhappy about.
>     What DEXSeq does is not a black box, it is in fact very simple, so
>     we should be able to get to the bottom of this.
>
>     Regarding the question in the second paragraph: if you have reason
>     to assume that the biological variability is the same in all your
>     conditions (knockdowns), then the joint dispersion estimation will
>     be more precise. But it is not biologically implausible that the
>     assumption may be wrong (e.g. because of the different efficiency
>     of RNAi), leading to underestimating of the true biological
>     variability (and there over-calling of results) in some conditions.
>
>     That also leads to the second point - with only two replicates per
>     condition, expectations about reproducibility of the result should
>     be modest. No amount of statistical software can undo that.
>
>     Best wishes
>             Wolfgang
>
>
>
> -- 
> -- 
> António Miguel de Jesus Domingues, PhD
> Postdoctoral researcher
> Deep Sequencing Group - SFB655
> Biotechnology Center (Biotec)
> Technische Universität Dresden
> Fetscherstraße 105
> 01307 Dresden
>
> Phone:+49 (351) 458 82362  <tel:%2B49%20%28351%29%20458%2082362>
> Email: antonio.domingues(at)biotec.tu-dresden.de  <http://biotec.tu-dresden.de>
> --
> The Unbearable Lightness of Molecular Biology