[BioC] limma, FDR, and p.adjust
Marcus Davy
MDavy at hortresearch.co.nz
Mon Dec 20 06:14:39 CET 2004
Mark,
there is a fdr website link via Yoav Benjamini's homepage which is: http://www.math.tau.ac.il/%7Eroee/index.htm
On it you can download an S-Plus function (under the downloads link) which calculates the false discovery rate threshold alpha level using stepup, stepdown, dependence methods etc.
Some changes are required to the plotting code when porting it to R. I removed the xaxs="s" arguement on line 80. The fdr function requires a list of p-values as input, a Q-value (*expected* false discovery rate control at level Q) and a required method of fdr controlling procedure.
> As you can see after running the code, the p values are truly being
> adjusted, but for what FDR? If I set my p value at 0.05, does that mean
> my FDR is 5%? I have been told by someone that is the case but,
> normally, when discussing FDR, q values are reported or just one p value
> is reported--the threshold for a set FDR. The p.adjust documentation is
> unclear.
The p.adjust function appears to be using the "stepup" fdr controlling procedure when method="fdr" is specified. It adjusts the
p-values so that FDR control is at the desired level alpha over the entire range (0,1), which gives the same result as specifying a
Q-value in the fdr function itself calculating a false discovery rate threshold alpha level so that FDR<=Q.
So it adjusts for all FDR desired levels. If your p-value threhold is 0.05 then the expected proportion of false discoveries is 5%.
e.g.
n <- 1000
pi0 <- 0.5
x <- rnorm(n, mean=c(rep(0, each=n*pi0), rep(3, each=n - (n*pi0))))
p <- 2*pnorm( -abs(x))
p <- sort(round(p,3))
p.adjusted <- p.adjust(p, method="fdr")
# Controlling fdr at Q, and p.adjust at level alpha
Qvalue <- alpha <- 0.05
threshold <- fdr(p, Q=Qvalue, method="stepup") # fdr function available from the website link above
threshold
plot(p, p.adjusted)
abline(v=threshold, lty=2)
abline(h=alpha, lty=2)
> # Stepup FDR control at Q=0.05
> sum(p <= threshold)
[1] 372
>
> # p.adjust(ed) p-values at level alpha=0.05
> sum(p.adjusted <= alpha)
[1] 372
Simultaneously modifying Qvalue, and alpha above to a different expected proportion of false discoveries should still produce identically sized rejected lists.
Hope that helps.
marcus
>>> "Kimpel, Mark W" <mkimpel at iupui.edu> 20/12/2004 3:57:43 AM >>>
I am posting this to both R and BioC communities because I believe there
is a lot of confusion on this topic in both communities (having searched
the mail archives of both) and I am hoping that someone will have
information that can be shared with both communities.
I have seen countless questions on the BioC list regarding limma
(Bioconductor) and its calculation of FDR. Some of them involved
misunderstandings or confusions regarding across which tests the FDR
"correction" is being applied. My question is more fundamental and
involves how the FDR method is implemented at the level of "p.adjust"
(package: stats).
I have reread the paper by Benjamini and Hochberg (1995) and nowhere in
their paper do they actually "adjust" p values; rather, they develop
criteria by which an appropriate p value maximum is chosen such that FDR
is expected to be below a certain threshold.
To try to get a better handle on this, I wrote the following simple
script to generate a list of random p values, and view it before and
after apply p.adjust (method=fdr).
rn<-abs(rnorm(100, 0.5, 0.33))
rn<-rn[order(rn)]
rn<-rn[1:80]
rn
p.adj<-p.adjust(rn, method="fdr")
p.adj
As you can see after running the code, the p values are truly being
adjusted, but for what FDR? If I set my p value at 0.05, does that mean
my FDR is 5%? I have been told by someone that is the case but,
normally, when discussing FDR, q values are reported or just one p value
is reported--the threshold for a set FDR. The p.adjust documentation is
unclear.
For the R developers, I can understand how one would want to include FDR
procedures in p.adjust, but I wonder, given the numerous FDR algorithms
now available, if it would be best to formulate an FDR.select function
that would be option to p.adjust and itself incorporate more recent FDR
procedures than the one proposed by Benjamini and Hochberg in 1995.
(Benjamini himself has a newer one). Some of these may currently be
available as add-on packages but they are not standardized regarding I&O
and this makes it difficult for developers to incorporate them into
packages such as limma.
So those are my questions and suggestions,
Thanks,
Mark W. Kimpel MD
_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
______________________________________________________
The contents of this e-mail are privileged and/or confidenti...{{dropped}}
More information about the Bioconductor
mailing list