[BioC] Drastic change in edgeR p-value calculation?

Gordon K Smyth smyth at wehi.EDU.AU
Thu Dec 20 01:45:10 CET 2012


Dear Lee Brooks,

The current version will behave almost the same as the older version if 
you set the prior.df to the old default value:

   estimateTagwiseDisp(y, prior.df=20)

It seems that you have some genes with large fold changes but also large 
variability between replicates.  With a large prior.df, the dispersions 
for these genes are forced down towards the global trend, and this is 
apparently enough to allow them to be statistically significant.  With a 
smaller prior.df, the tagwise dispersions remain closer to individual 
dispersion values, so the genes are penalized more for their large 
variability, and they become non-significant.

Best wishes
Gordon


> Date: Tue, 18 Dec 2012 13:35:11 -0500
> From: "Lionel (Lee) Brooks 3rd" <Lionel.Brooks.GR at Dartmouth.edu>
> To: bioconductor at r-project.org
> Subject: [BioC] Drastic change in edgeR p-value calculation?
>
> Hello Gentlepeople,
>
> I have been using the edgeR package to identify differentially abundant 
> tags in an RNA-seq experiment.
>
> I was happy when the edgeR package version 2.4.6 called 41 
> differentially abundant tags.
>
> However, now I am unhappy because the same analysis with edgeR package 
> version 3.0.2 does not call any differentially abundant tags.
>
> I've compared the output of the two versions and I can see that both 
> versions calculate the same fold changes - however - the p-value 
> calculation is different.
>
> I feel that the optimal solution for me is to determine the command 
> arguments that I can use to emulate the previous default behavior.
>
> Can anyone help me figure that out?
> I would be extremely grateful for some guidance.
> This way, I could keep my bioconductor packages up to date.
>
> If you think you can help then please see below, for the code and
> session Info.
>
> Thank you for your time.
>
> sincerely,
> Lee Brooks
>
> ----
> HERE IS THE CODE PART
>
> Here is the procedure that I have been using,
> where x is the data frame object containing my data:
> y <- DGEList(counts=x,group=group)
> y <- calcNormFactors(y)
> y <- estimateCommonDisp(y)
> y <- estimateTagwiseDisp(y)
> et <- exactTest(y)
> topTags(et)
> de <- decideTestsDGE(et, p=0.05, adjust="BH")
> detags <- et[as.logical(de)]
> detags <- topTags(detags, n = length(detags))
>
> ----
> HERE IS AN EXAMPLE OUTPUT FROM EACH VERSION:
>
> Tables Tables
>
> 	logFC 	logCPM 	PValue 	FDR
> fooTag_edgeR_2.4.6 	5.12318 	12.6577 	2.9E-30 	8.9E-26
> fooTag_edgeR_3.0.2 	5.12318 	12.6612 	0.00717 	1
>
>
> -----
> HERE IS THE sessionInfo() PART...
>
> THE CONFIGURATION THAT WORKED FOR ME:
>
> R version 2.14.1 (2011-12-22)
> Platform: i686-pc-linux-gnu (32-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=C                 LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods base
>
> other attached packages:
> [1] edgeR_2.4.6  limma_3.10.3
>
> THE CONFIGURATION THAT DOES NOT WORK FOR ME:
>
> R version 2.15.2 (2012-10-26)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8
>  [5] LC_MONETARY=en_US.utf8    LC_MESSAGES=en_US.utf8
>  [7] LC_PAPER=C                LC_NAME=C
>  [9] LC_ADDRESS=C              LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods base
>
> other attached packages:
> [1] edgeR_3.0.2  limma_3.14.1
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list