[Rd] p.adjust(<NA>s), was "Re: [BioC] limma and p-values"

Mon Jan 17 22:02:39 CET 2005

>>>>> "GS" == Gordon Smyth <smyth at wehi.edu.au>
>>>>>     on Sun, 16 Jan 2005 19:55:35 +1100 writes:

    GS> I append below a suggested update for p.adjust().  

thank you.

    GS> 1. A new method "yh" for control of FDR is included which is
    GS> valid for any dependency structure. Reference is
    GS> Benjamini, Y., and Yekutieli, D. (2001).  The control of
    GS> the false discovery rate in multiple testing under
    GS> dependency. Annals of Statistics 29, 1165-1188.

good, thanks!

    GS> 2. I've re-named the "fdr" method to "bh" but kept "fdr"
    GS> as a synonym for backward compatability.
ok

    GS> 3. Upper case values for method "BH" or "YH" are also
    GS> accepted.

I don't see why we'd want this.  The S language is
case-sensitive and we don't want to lead people to believe
that case wouldn't matter.

    GS> 4. p.adust() now preserves attributes like names for
    GS> named vectors (as does cumsum and friends for example).

good point; definitely desirable!!

    GS> 5. p.adjust() now works columnwise on numeric
    GS> data.frames (as does cumsum and friends).

well, "cusum and friends" are either generic or groupgeneric
(for the "Math" group) -- there's a Math.data.frame group
method.
This is quite different for p.adjust which is not generic and
I'm not (yet?) convinced it should become so.

People can easily use sapply(d.frame, p.adjust, method) if needed; 

In any case it's not in the spirit of R's OO programming to
special case "data.frame" inside a function such as p.adjust

    GS> 6. method="hommel" now works correctly even for n=2

ok, thank you (but as said, in R 2.0.1 the behavior was much
more problematic)

    GS> 7. The 'n' argument is removed. Setting this argument
    GS> for any methods other than "none" or "bonferroni" make
    GS> the p-values indeterminate, and the argument seems to be
    GS> seldom used. (It isn't used in the R default
    GS> distribution.) I think trying to combine this argument
    GS> with NAs would get you into lots of hot water. For
    GS> example, what does p.adjust(c(NA,NA,0.05),n=2) mean?
    GS> Which 2 values should be adjusted?

I agree that I don't see a good reason to allow specifying 'n'
as argument unless e.g. for "bonferroni".
What do other think ?

    GS> 8. NAs are treated in na.exclude style. This is the
    GS> correct approach for most applications. The only other
    GS> consistent thing you could do would be to treat the NAs
    GS> as if they all had value=1. But then you would have to
    GS> explain clearly that the values being returned are not
    GS> actually the correct adjusted p-values, which are
    GS> unknown, but are the most conservative possible values
    GS> assuming the worst-case for the missing values. This
    GS> would become arbitrarily unreasonable as the number of
    GS> NAs increases.

I now agree that your proposed default behavior is more sensible
than my proposition.
I'm not sure yet if it wasn't worth to allow for other NA
treatment, like the "treat as if 1" {which my code proposition
was basically doing} or rather mre sophisticated procedure like
"integrating" over all P ~ U[0,1] marginals for each missing
value, approximating the integral possibly by "Monte-Carlo" 
even quasi random numbers.