[BioC] edgeR and FDR

Gordon K Smyth smyth at wehi.EDU.AU
Sat Jun 26 16:39:01 CEST 2010


Hi Naomi,

I agree that the discreteness of the counts introduces conservatism, and 
that there is a power differential between low and high expressed genes. 
However the expected overall FDR is still controlled at a rate less than 
or equal to the nominal rate, and that is all we promise.

To reduce the trend in DE vs expression level, I like to combine FDR with 
a fold-change cutoff or, perhaps better, use a TREAT like test.

Regards
Gordon

On Sat, 26 Jun 2010, Naomi Altman wrote:

> Dear Gordon,
> Thank you for your very detailed and clear answer to my question about the 
> dispersion model.
>
> Regarding FDR:
> For discrete-valued test statistics, the distribution of the p-values under 
> the null hypothesis is a discrete uniform which depends on the marginal 
> total.  As a result,
> under the distribution of p-values from the null hypotheses is a mixture of 
> discrete uniforms, which can be marginally very non-uniform.  Even after 
> filtering out low expressing genes, it is common to see a peak of p-values 
> near 1.0 due to this effect.  It is less evident that there are multiple 
> other peaks, one at each of the discrete values of the p-value for each 
> marginal total.  The result of this is that FDR computations are far too 
> conservative for lowly expressing genes, and far too liberal for highly 
> expressing genes which basically magnifies the power differential that 
> already exists due to the relationship between the mean and variance.
>
> --Naomi
>
> At 05:01 AM 6/26/2010, Gordon K Smyth wrote:
>> Dear Zhe,
>> 
>> To get FDR, you must use the topTags() function.  Is your de.com object a 
>> deDGEList object?  If it is, then
>>
>>   top <- topTags(de.com, n=Inf)
>>   write.table(top$table, file="yourfile.txt")
>> 
>> will do what you want.  (I can't tell you what level of FDR to use as your 
>> cutoff though, that's up to you.)
>> 
>> Naomi, I don't know of any problem with FDR from edgeR.  It should work 
>> just fine.
>> 
>> Best wishes
>> Gordon
>> 
>> -----------------------------------------------
>> Associate Professor Gordon K Smyth,
>> NHMRC Senior Research Fellow,
>> Bioinformatics Division, Walter and Eliza Hall Institute of Medical 
>> Research, 1G Royal Parade, Parkville, Vic 3052, Australia.
>> smyth at wehi.edu.au
>> http://www.wehi.edu.au
>> http://www.statsci.org/smyth
>> 
>> 
>> 
>> ------------ original message ---------------
>> [BioC] edgeR question
>> Naomi Altman naomi at stat.psu.edu
>> Fri Jun 25 22:43:51 CEST 2010
>> 
>> Hi Zhe,
>> 1. First normalize and then do the DE
>> analysis.  (I found this confusing in the vignette, too.)
>> 
>> 2. I do not suggest using FDR at this time.  The
>> standard FDR computations need to be adjusted for
>> count data.  I do not think this has been worked out yet.
>> 
>> --Naomi
>> 
>> 
>> At 12:21 PM 6/25/2010,  wrote:
>> 
>>> Hello,
>>> 
>>> I am learning edgeR and would like to use it
>>> dealing with my Tag-seq and RNA-seq data. I have several questions:
>>> 
>>> 1. Does the DE analysis using common
>>> dispersion or moderated tagwise dispersions use
>>> the TMM method for normalization?  I am not
>>> sure the relationship between Setion 6
>>> (Normalization) and the following sections in
>>> the user manual. I suppose I should normalize
>>> the data first, and then perform DE analysis.
>>> 
>>> 2. Do you suggest to use P-value < 0.01? What
>>> about FDR < 0.05? After saving de.tagwise (>
>>> write.table(de.com[[1]], file =
>>> "/Users/Zhe/edgeR/page7", sep = "\t")), I found
>>> there is not a column of the FDR. How to
>>> calculate the FDR for each gene and save it in the output file.
>>> 
>>> Thanks a lot.
>>> Best wishes,
>>> 
>>> Zhe
>> 
>> ______________________________________________________________________
>> The information in this email is confidential and intend...{{dropped:4}}
>> 
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> Naomi S. Altman                                814-865-3791 (voice)
> Associate Professor
> Dept. of Statistics                              814-863-7114 (fax)
> Penn State University                         814-865-1348 (Statistics)
> University Park, PA 16802-2111
>
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list