[BioC] EdgeR: GoF plot and prior df

Gordon K Smyth smyth at wehi.EDU.AU
Mon Feb 3 23:10:48 CET 2014


On Mon, 3 Feb 2014, Adriaan Sticker wrote:

> Dear Gordon,
>
> Thanks a lot for your patience. I'm still a novice in the field. When I
> read the 2012 McCarthy paper, I was somehow under the impression that you
> also assumed a chisquare distribution of the deviance residuals in the GOF
> plots and the better the fit with the theoretical quantiles, the better the
> model.

The GOF plots were used in McCarthy et al to show the inadequacy of the 
common and trended dispersion models, and in those cases the GoF plot is 
valid.  The common and trended dispersion models are estimated from the 
global data so that the data from each individual gene has little 
influence on its own dispersion estimation, so the residual deviance is 
close to chisquare if the model is correct.

At the time we wrote the 2012 paper, we did not yet understand ourselves 
that the GoF plot will look flatter than the 1-1 line when the prior df is 
optimally estimated.

> Anyway, Is there somewhere a paper that describes how estimateDisp() 
> decides on the prior? I looked into the source code but I'm afraid I 
> don't completely grap how it works.

Here's a recent paper describing estimateDisp:

   http://www.statsci.org/smyth/pubs/edgeRChapterPreprint.pdf

Best wishes
Gordon

> Kind regards
> Adriaan
>
>
> 2014-02-02 Gordon K Smyth <smyth at wehi.edu.au>:
>
>>
>> On Sun, 2 Feb 2014, Adriaan Sticker wrote:
>>
>>  Thanks a lot for your input, Gordon!
>>>
>>> I'm still a bit puzzeled why your deviance don't have to follow a chi
>>> squared distribution when you estimate tagwise dispersion (that what you
>>> looking at with the gof plot, I guess). I put an example of the GOF plots
>>> in attachment. One plot based one a tagewise dispersion based on my
>>> manually adjusted prior df of 25 and one when the prior.df is estimaded by
>>> estimateDisp() at 9 and a third with robust estimation. I also put the
>>> corresponding bcv plots for completness. It seems like you overestimate
>>> your variation for the higer values.
>>>
>>
>> But we don't.
>>
>>
>>  If the true deviances do not follow the theoretical expected chi^2
>>> distribution under null, how are the p values you get from glmLRT function
>>> still correct?
>>>
>>
>> The p-values are calculated from deviance differences, not the residual
>> deviance itself.  The former is chisquare, the second is not.
>>
>>
>>  Maybe I understand this gof plot wrong, I noticed it's also not mentioned
>>> in the manual.
>>>
>>
>> It's not mentioned in the manual because you don't need it.  It was used
>> to demonstrate the inadequacy of the common or trended dispersion models.
>>
>> Gordon
>>
>>
>>  Note that I also find 100 more differentially expressed genes with my
>>> manual set prior.df (320 vs 219 genes) so it makes a big difference.
>>>
>>> Greetings
>>>
>>>
>>> 2014-02-02 Gordon K Smyth <smyth at wehi.edu.au>:
>>>
>>>  Dear Adriann,
>>>>
>>>>
>>>> On Sun, 2 Feb 2014, Adriaan Sticker wrote:
>>>>
>>>>  Dear Gordon,
>>>>
>>>>>
>>>>> Thanks a lot for your input. I tried the automatic prior.df estimation
>>>>> of
>>>>> the estimateDisp() function. and its suggests a much lower prior.df
>>>>> then I
>>>>> put mannually (9 instead of 25) But when I look at the gof plot, it's
>>>>> way
>>>>> off. I thought that a good guide for a prior.df estimation is looking
>>>>> for
>>>>> a
>>>>> value that puts the calculated deviances as close as possible to the
>>>>> theoretical espected values. This is the prior.df for which your
>>>>> deviances
>>>>> are straight on the  diagonal line of gof / qq plot)
>>>>>
>>>>>
>>>> Not this isn't so.  The value returned by estimateDisp() is better.
>>>>
>>>> Plotting the gof is valid for showing that the common or trended
>>>> dispersion models are inadequate, but the QQ plot of the GOF statistics
>>>> doesn't work properly any more once the tagwise dispersions have been
>>>> estimated.  This is because the tagwise dispersions are estimated from
>>>> the
>>>> same genewise data that is being plotted.
>>>>
>>>> I admit that we have not made that sufficiently clear in the
>>>> documentation.
>>>>
>>>> Best wishes
>>>> Gordon
>>>>
>>>>
>>>>
>>>>  Or am I wrong here?
>>>>
>>>>>
>>>>> Best Regards
>>>>> Adriaan

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list