[BioC] EdgeR: artifacts on BCV plot

Gordon K Smyth smyth at wehi.EDU.AU
Mon Feb 3 00:27:10 CET 2014


On Sun, 2 Feb 2014, Adriaan Sticker wrote:

> Thanks a lot for your input, Gordon!
>
> I'm still a bit puzzeled why your deviance don't have to follow a chi 
> squared distribution when you estimate tagwise dispersion (that what you 
> looking at with the gof plot, I guess). I put an example of the GOF 
> plots in attachment. One plot based one a tagewise dispersion based on 
> my manually adjusted prior df of 25 and one when the prior.df is 
> estimaded by estimateDisp() at 9 and a third with robust estimation. I 
> also put the corresponding bcv plots for completness. It seems like you 
> overestimate your variation for the higer values.

But we don't.

> If the true deviances do not follow the theoretical expected chi^2 
> distribution under null, how are the p values you get from glmLRT 
> function still correct?

The p-values are calculated from deviance differences, not the residual 
deviance itself.  The former is chisquare, the second is not.

> Maybe I understand this gof plot wrong, I noticed it's also not mentioned
> in the manual.

It's not mentioned in the manual because you don't need it.  It was used 
to demonstrate the inadequacy of the common or trended dispersion models.

Gordon

> Note that I also find 100 more differentially expressed genes with my
> manual set prior.df (320 vs 219 genes) so it makes a big difference.
>
> Greetings
>
>
> 2014-02-02 Gordon K Smyth <smyth at wehi.edu.au>:
>
>> Dear Adriann,
>>
>>
>> On Sun, 2 Feb 2014, Adriaan Sticker wrote:
>>
>>  Dear Gordon,
>>>
>>> Thanks a lot for your input. I tried the automatic prior.df estimation of
>>> the estimateDisp() function. and its suggests a much lower prior.df then I
>>> put mannually (9 instead of 25) But when I look at the gof plot, it's way
>>> off. I thought that a good guide for a prior.df estimation is looking for
>>> a
>>> value that puts the calculated deviances as close as possible to the
>>> theoretical espected values. This is the prior.df for which your deviances
>>> are straight on the  diagonal line of gof / qq plot)
>>>
>>
>> Not this isn't so.  The value returned by estimateDisp() is better.
>>
>> Plotting the gof is valid for showing that the common or trended
>> dispersion models are inadequate, but the QQ plot of the GOF statistics
>> doesn't work properly any more once the tagwise dispersions have been
>> estimated.  This is because the tagwise dispersions are estimated from the
>> same genewise data that is being plotted.
>>
>> I admit that we have not made that sufficiently clear in the documentation.
>>
>> Best wishes
>> Gordon
>>
>>
>>
>>  Or am I wrong here?
>>>
>>> Best Regards
>>> Adriaan
>>>
>>>
>>> 2014-02-02 Gordon K Smyth <smyth at wehi.edu.au>:
>>>
>>>  Date: Fri, 31 Jan 2014 11:59:13 +0000
>>>>
>>>>> From: Adriaan Sticker <adriaan.sticker at gmail.com>
>>>>> To: Ryan <rct at thompsonclan.org>
>>>>> Cc: bioconductor at r-project.org
>>>>> Subject: Re: [BioC] EdgeR: artifacts on BCV plot
>>>>>
>>>>> Hi
>>>>> Thanks for your input. I checked manually the counts of the lowest BCV
>>>>> values (see below) And I see nothing strange. Except the fact that the
>>>>> counts are all at the low side. So I think I will keep them in.
>>>>> Is it correct to think that the reason they appear on 1 horizontal line
>>>>> is
>>>>> because of the discreteness of the counts?
>>>>>
>>>>>
>>>> No it is not because of discreteness.  It is because zero is
>>>> mathematically a perfectly possible value for the BCV.
>>>>
>>>> These genes appear to show variability that is equal or less than Poisson
>>>> variability, even after pulling them up towards the dispersion trend.  In
>>>> other words, these genes are not showing any evidence of differences
>>>> between biological replicates.
>>>>
>>>> Gordon
>>>>
>>>>  Greetings
>>>>
>>>>> Adriaan
>>>>>
>>>>>
>>>> ______________________________________________________________________
>>>> The information in this email is confidential and intended solely for the
>>>> addressee.
>>>> You must not disclose, forward, print or use it without the permission of
>>>> the sender.
>>>> ______________________________________________________________________
>>>>
>>>>
>>>
>> ______________________________________________________________________
>> The information in this email is confidential and intended solely for the
>> addressee.
>> You must not disclose, forward, print or use it without the permission of
>> the sender.
>> ______________________________________________________________________
>>
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list