[BioC] Differential expression testing for groups with unequal variances/dispersions?

Gordon K Smyth smyth at wehi.EDU.AU
Mon May 27 07:12:54 CEST 2013


On Sat, 25 May 2013, Ryan C. Thompson wrote:

> Hi Gordon,
>
> Thanks for the tips. You say that edgeR should be conservative when the 
> equal dispersion assumption is violated, but this is not my experience. 
> (I probably wouldn't have asked here on the list unless I was worried 
> about false positives.) What I've seen is that will all 4 groups 
> included in a single analysis, the low-dispersion time points drag down 
> to overall dispersion estimate, and this results in (apparently) 
> anticonservative results when testing for differential modification 
> between the two high-dispersion time points.

Yes, that could happen.

> Obviously, I don't have a gold standard to compare against to conclude 
> that the test is anticonservative, but I can compare to the results to 
> previous analyses that I did before the final low-dispersion time point 
> had come off the sequencer, and as expected, including the 
> low-dispersion timepoint inflated the significance of most P-values in 
> all contrasts.
>
> So, to get around this, would you recommend testing between time points by 
> first subsetting the DGEList to just the two time points being compared and 
> then re-estimating the dispersions, then finally conducting the test? That 
> way, each individual test would be "self-contained" and not affected by 
> groups that are not being tested.

That could be a sensible way to go, but it's up to you.  I don't recommend 
this as something to do routinely.

Why are the earlier time points so variable?  Different protocol? 
Presumably it is a technical issue -- it seems unlikely that a biological 
response would be more variable at time zero than at later times, and the 
dispersions seem very high.  Can the high variability of the earlier 
samples be mitigated by filtering or by removing an outlier library?

If you are convinced that the difference in variability is real and not 
removable, and if the counts are generally not too small, then you could 
also try the voom option.  Voom could allow you to analyse all the 
libraries together and still take account of variability in each group. 
What you want to do is what voomaByGroup() does, but for ChIP-seq instead 
of microarrays.  That's only a suggestion -- I have not seriously tested 
voom() myself on ChIP-seq data.

> I could imagine that under these conditions, edgeR might be 
> conservative, as you say.

I would exect so.

Best wishes
Gordon

> -Ryan Thompson
>
> On Sat May 25 04:28:39 2013, Gordon K Smyth wrote:
>> Hi Ryan,
>> 
>> edgeR can't.
>> 
>> voom can, but you have to put it together partly yourself.  Just fit
>> voom to each timepoint separately, then cbind the voom output objects
>> back together.
>> 
>> Or else just proceed in edgeR as if the dispersions are equal across
>> timepoints.  This will be conservative but won't give false positive
>> results.
>> 
>> Best wishes
>> Gordon
>> 
>>> Date: Fri, 24 May 2013 12:10:09 -0700
>>> From: "Ryan C. Thompson" <rct at thompsonclan.org>
>>> To: bioconductor <Bioconductor at r-project.org>
>>> Subject: [BioC] Differential expression testing for groups with
>>>     unequal    variances/dispersions?
>>> 
>>> Hi all,
>>> 
>>> I am studying a ChIP-Seq dataset (looking at gene promoter regions in
>>> human) where it appears that different experimental groups have widely
>>> different dispersions/variances using edgeR/limma. I have 4 timepoints,
>>> and if I use edgeR to compute the dispersion for each timepoint
>>> separately, I get:
>>> 
>>> 0 hours: 0.407
>>> 24 hours: 0.505
>>> 120 hours: 0.115
>>> 2 weeks: 0.0531
>>> 
>>> So the dispersion seems to range from 0.05 to 0.5. I am looking to test
>>> for "differential modification" between these timepoints, as well as
>>> between cell types at each timepoint, etc., and I was wondering if there
>>> is any differential expression test (or dispersion estimation method?)
>>> that can handle groups with different dispersions/variances.
>>> 
>>> For reference, here is my experimenal design as an Excel spreadsheet:
>>> https://www.dropbox.com/s/3vnk4mai3dh39yv/chipseq-samples.xlsx
>>> 
>>> And here is the result of plotBCV on each group (look at the last 4
>>> pages for the time point groups):
>>> https://www.dropbox.com/s/s4caq1p0h3e4zhm/groupdisps.pdf (Warning: big
>>> PDF with lots of points which may bring your PDF reader to its knees.)
>>> 
>>> -Ryan Thompson

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list