[R] Simulate p-value in lme4

Manuel Morales Manuel.A.Morales at williams.edu
Thu Aug 17 22:49:31 CEST 2006

Dear list,

This is more of a stats question than an R question per se. First, I
realize there has been a lot of discussion about the problems with
estimating P-values from F-ratios for mixed-effects models in lme4.
Using mcmcsamp() seems like a great alternative for evaluating the
significance of individual coefficients, but not for groups of
coefficients as might occur in an experimental design with 3 treatment
levels. I'm wondering if the simulation approach I use below to estimate
the P-value for a 3-level factor is appropriate, or if there are any
suggestions on how else to approach this problem. The model and data in
the example are from section 10.4 of MASS.


# Load req. package (see functions to generate data at end of script)

# Full and reduced models - pred is a factor with 3 levels
result.full <- lmer(y~pred+(1|subject), data=epil3, family="poisson")
result.base <- lmer(y~1+(1|subject), data=epil3, family="poisson")

# Naive P-value from LR for significance of "pred" factor
anova(result.base,result.full)$"Pr(>Chisq)"[[2]] # P-value
(test.stat <- anova(result.base,result.full)$Chisq[[2]]) # Chisq-stat

# P-value from simulation. Note that in the simulation, I use the
# estimated random effects for each subject rather than generating a new
# distribution of means. I'm not sure if this is appropriate or not ...
intercept <- fixef(result.base)
rand.effs <- ranef(result.base)[[1]]
mu <- exp(rep(intercept+rand.effs[[1]],2))

p.value <- function(iter, stat) {
  chi.stat <- vector()
  for(i in 1:iter) {
    resp <- rpois(length(mu), mu) # simulate values
    sim.data <- data.frame(y=resp,subject=epil3$subject,pred=epil3$pred)
    result.f <- lmer(y~pred+(1|subject), data=sim.data,
    result.b <- lmer(y~1+(1|subject), data=sim.data, family="poisson")
    chi.stat[i] <- anova(result.b,result.f)$Chisq[[2]]
  val <- sum(unlist(lapply(chi.stat, function(x) if(x>stat) 1 else

p.value(10,test.stat) # Increase to >=1000 to get a reasonable P-value!

# Script to generate data, from section 10.4 of MASS
epil2 <- epil[epil$period == 1, ]
epil2["period"] <- rep(0, 59); epil2["y"] <- epil2["base"]
epil["time"] <- 1; epil2["time"] <- 4
epil2 <- rbind(epil, epil2)
epil2$pred <- unclass(epil2$trt) * (epil2$period > 0)
epil2$subject <- factor(epil2$subject)
epil3 <- aggregate(epil2, list(epil2$subject, epil2$period > 0),
                   function(x) if(is.numeric(x)) sum(x) else x[1])
epil3$pred <- factor(epil3$pred, labels = c("base", "placebo", "drug"))

More information about the R-help mailing list