[R] two-way categorical anova post-hoc data extraction

hadley wickham h.wickham at gmail.com
Wed Dec 12 20:59:47 CET 2007

> Hi list,
> I have a question regarding post-hoc extraction of data from a two-way categorical anova.
> I have a categorical anova of this form:
> width ~ steepness + patchiness (4 steepness levels, 4 patchiness levels)
> This simple setup answers if for the widths I collected across different levels of steepness and patchiness significant differences can be found. Is there a way to look at these differences in detail. Lets say that the steepness parameter is significant, then I would like to know between which levels they are significant or if there are levels where this isn't the case.
> It's a basic question but my R knowledge has faded somewhat...

I've never found this particularly easy to do.  For a recent client I wrote:


effectsum <- function(model, effect) {
  effects <- as.data.frame(all.effects(model)[[effect]])

  mcp <- list("Tukey")
  names(mcp) <- effect
  class(mcp) <- "mcp"

  glht_sum <- summary(glht(model, linfct = mcp))
  p <- as.vector(glht_sum$test$pvalues)
  names(p) <- gsub(" ", "", names(glht_sum$test$tstat))

  groups <- multcompLetters(p)
  effects[, 1] <- factor(effects[, 1])
  effects$group <- groups$Letters[as.character(effects[, 1])]


which allows you to do:

mtcars$cyl <- factor(mtcars$cyl)
simple <- lm(mpg ~ wt + cyl, data=mtcars)
effects <- effectsum(simple, "cyl")

qplot(cyl, fit, data=effects, min = lower, max = upper,
geom="pointrange", ylab="Mean effect") + geom_text(aes(label = group,
y = min(lower) - diff(range(lower)) * 0.07))

ggplot(mtcars, aes(x = cyl, y = mpg)) + geom_crossbar(aes(min=lower,
max=upper, y=fit), data=effects, width=0.2) +

This gives lsmeans (aka population marginal means) with their standard
errors, and groups generated from all pairwise comparisons adjusted
for multiple comparisons.

I would love to improve this code: to deal with all factors in a model
automatically.  Additionally, all.effects and multcompLetters are
rather fragile with respect to level names - if you get an error try
removing any non-alphanumeric characters,



More information about the R-help mailing list