[R] ANOVA "ex post" Analysis

Peter Dalgaard BSA p.dalgaard at biostat.ku.dk
Mon May 10 17:58:21 CEST 1999


Ben Bolker <ben at eno.princeton.edu> writes:

>   [ multiple comparisons question ]
> 
>   In my opinion a multiple comparisons test is certainly worth having.  I
> think it's just that no-one has gotten around to coding one.  (I do know
> as well that multiple comparisons tests are sometimes frowned on by "real"
> statisticians as being on the slippery slope leading to fishing
> expeditions.)

On the contrary, I feel that they are very effective in *preventing*
people from going on fishing expeditions when they see how their
evidence vanishes in thin air as corrections are applied...

The problem in relation to programs like R is that the tests are
typically not very general. A good many of them really only work for
comparing means from a balanced design with equal replications per
factor level, with constant variance and (perhaps not all that
important) Normal error distribution.

Thus, they won't fit into lm() or any such general framework, which of
course shouldn't necessarily keep anyone from coding a special-purpose
function for the standard case including the 20-odd variations on the
multiple comparison issue. 

However, since I hardly ever see a "standard case" analysis (except
when teaching....), I'm not highly motivated to write a function for
it. 

Of generally applicable techniques, there really isn't much beyond
Bonferroni correction and elaborations of it. Thomas Lumley posted the
following a long time ago (perhaps we should include them in base?).
The Holm method is the one that makes most sense to me.

---------------------------------------------------------
From:    Thomas Lumley <thomas at biostat.washington.edu>
Date:    Tue Jul 14,  5:33pm -0700
To:      Matthew Kay <mwkay>

Here is code for three p-value based post hoc tests: bonferroni, holm, and
hochberg. Holm is better than Bonferroni under all circumtances. Hochberg
is more powerful than Holm, but under some slightly pathological
situations it can exceed the nominal Type I error rate.

I haven't looked at this code for a long time, but I think it works
correctly. Refeerences for the methods are in "Adjusted p-values for
simultaneous inference" by S. Paul Wright in Biometrics some time in
the early 90s (I'm in the wrong country at the moment so I don't have the
full reference).

        -thomas
-------------------------

   p.adjust.holm <-function(p,n=length(p)) {
        ##n<-length(p)
        r<-rank(p)
        index<-order(p)
        qi<-p*(n+1-r)
        for (i in 2:length(p)) {
                qi[index[i]]<-max(qi[index[i]], qi[index[i-1]])
        }
        list(adjp=pmin(qi,1),p=p,method="Holm")
        }
  p.adjust.hochberg <-function(p) {
        n<-length(p)
        r<-rank(p)
        index<-order(p)
        qi<-p*(n+1-r)
        for (i in (n-1):1) {
                qi[index[i]]<-min(qi[index[i]], qi[index[i+1]])
        }
        list(adjp=qi,p=p,method="Hochberg")
        }
  p.adjust.bonferroni<-function(p,n=length(p)){
        list(adjp=pmin(p*n,1),p=p,method="Bonferroni")
        }

p.adjust<-function(p,method=c("hochberg","holm","bonferroni"),...){
  how<-pmatch(method[1],c("hochberg","holm","bonferroni"))
  if (is.na(how)) stop(paste("Don't know method:",method))
  m<-match.call()
  m[[1]]<-as.name(paste("p.adjust",c("hochberg","holm","bonferroni")[how],sep=".
"))
  m$method<-NULL
  eval(m,sys.parent())
}


>   I know one of the students around here looked in Zar's Biostatistics
> book and coded up a multiple comparisons test for himself; if I get around
> to it I will get it from him and post it.  The hardest part of the job,
> the distribution of the test statistic for (? I forget which one: Tukey's
> LSD ?), is already incorporated in R as qtukey() and ptukey().

Tukey's *H*SD (also known as the Studentised Range), LSD is Fishers
Least Significant Difference, i.e. uncorrected t tests, HSD for
"*Honestly* Signif. Diff."....

We don't seem to have the many-one t distribution ( max(X_k - X_1)/s,
X_1,...,X_n ~ indep. N(0,sigma), s^2 ~ sigma^2*chisq(df) ), though.
Anyone have an algorithm?

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list