[Rd] p.adjust(<NA>s), was "Re: [BioC] limma and p-values"

Sun Jan 16 09:55:35 CET 2005

I append below a suggested update for p.adjust().

1. A new method "yh" for control of FDR is included which is valid for any 
dependency structure. Reference is Benjamini, Y., and Yekutieli, D. (2001). 
The control of the false discovery rate in multiple testing under 
dependency. Annals of Statistics 29, 1165-1188.

2. I've re-named the "fdr" method to "bh" but kept "fdr" as a synonym for 
backward compatability.

3. Upper case values for method "BH" or "YH" are also accepted.

4. p.adust() now preserves attributes like names for named vectors (as does 
cumsum and friends for example).

5. p.adjust() now works columnwise on numeric data.frames (as does cumsum 
and friends).

6. method="hommel" now works correctly even for n=2

7. The 'n' argument is removed. Setting this argument for any methods other 
than "none" or "bonferroni" make the p-values indeterminate, and the 
argument seems to be seldom used. (It isn't used in the R default 
distribution.) I think trying to combine this argument with NAs would get 
you into lots of hot water. For example, what does 
p.adjust(c(NA,NA,0.05),n=2) mean? Which 2 values should be adjusted?

8. NAs are treated in na.exclude style. This is the correct approach for 
most applications. The only other consistent thing you could do would be to 
treat the NAs as if they all had value=1. But then you would have to 
explain clearly that the values being returned are not actually the correct 
adjusted p-values, which are unknown, but are the most conservative 
possible values assuming the worst-case for the missing values. This would 
become arbitrarily unreasonable as the number of NAs increases.

Gordon

p.adjust.methods <-
     c("holm", "hochberg", "hommel", "bonferroni", "yh", "bh", "fdr", "none")

p.adjust <- function(p, method = p.adjust.methods) {
     method <- match.arg(tolower(method),p.adjust.methods)
     if(method=="fdr") method <- "bh"
     if(is.data.frame(p)) {
         if(length(p)) for(i in 1:length(p)) p[[i]] <- 
Recall(p[[i]],method=method)
         return(p)
     }
     porig <- p
     notna <- !is.na(p)
     p <- as.vector(p[notna])
     n <- length(p)
     if (n == 1) return(porig)
     if (n == 2 && method=="hommel") method <- "hochberg"
     porig[notna] <- switch(method,
            holm = {
                i <- 1:n
                o <- order(p)
                ro <- order(o)
                pmin(1, cummax( (n - i + 1) * p[o] ))[ro]
            },
            hochberg = {
                i <- n:1
                o <- order(p, decreasing = TRUE)
                ro <- order(o)
                pmin(1, cummin( (n - i + 1) * p[o] ))[ro]
            },
            hommel = {
                i <- 1:n
                s <- sort(p, index = TRUE)
                p <- s$x
                ro <- order(s$ix)
                q <- pa <- rep.int( min(n*p/(1:n)), n)
                for (j in (n-1):2) {
                    q1 <- min(j*p[(n-j+2):n]/(2:j))
                    q[1:(n-j+1)] <- pmin( j*p[1:(n-j+1)], q1)
                    q[(n-j+2):n] <- q[n-j+1]
                    pa <- pmax(pa,q)
                }
                pmax(pa,p)[ro]
            },
            bh = {
                i <- n:1
                o <- order(p, decreasing = TRUE)
                ro <- order(o)
                pmin(1, cummin( n / i * p[o] ))[ro]
            },
            yh = {
                i <- n:1
                o <- order(p, decreasing = TRUE)
                ro <- order(o)
                q <- sum(1/(1:n))
                pmin(1, cummin(q * n/i * p[o]))[ro]
            },
            bonferroni = pmin(n * p, 1),
            none = p)
     porig
}