# [R] Simulate p-value in lme4

Manuel Morales Manuel.A.Morales at williams.edu
Mon Aug 21 21:10:42 CEST 2006

```Spencer,

Thanks for the reply. I concluded the same wrt between group variation
soon after posting. However, the approach I ended up with was fully
parametric as opposed to the resampling approach that you use in your
reply. Interestingly, the two approaches yield different P-values, I
think because your approach retains overdispersion in the data (?). In
any case, my parametric stab at this is below.

iter <- 10
chisq.sim <- rep(NA, iter)

Zt <- slot(model1,"Zt") # see ?lmer-class
n.grps <- dim(ranef(model1)[])
sd.ran.effs <- sqrt(VarCorr(model1)[])
X <- slot(model1,"X") # see ?lmer-class
fix.effs <- matrix(rep(fixef(model1),dim(X)), nrow=dim(X),
byrow=T)
model.parms <- X*fix.effs # This gives parameters for each case
# Generate predicted values
pred.vals <- as.vector(apply(model.parms, 1, sum))

for(i in 1:iter) {
rand.new <- as.vector(rnorm(grps,0, sd.ran.effs))
rand.vals <- as.vector(rand.new%*%Zt) # Assign random effects
mu <- pred.vals+rand.vals # Expected mean
resp <- rpois(length(mu), exp(mu))
sim.data <- data.frame(slot(model2,"frame"), resp) # Make data frame
sim.model1 <- lmer(resp~1+(1|subject), data=sim.data,
family="poisson")
sim.model2 <- lmer(resp~pred+(1|subject), data=sim.data,
family="poisson")
chisq.sim[i] <- anova(sim.model1,sim.model2)\$Chisq[]
}

Manuel

On Sun, 2006-08-20 at 11:22 -0700, Spencer Graves wrote:
>       You've raised a very interesting question about testing a
> fixed-effect factor with more than 2 levels using Monte Carlo.  Like
> you, I don't know how to use 'mcmcsamp' to refine the naive
> approximation. If we are lucky, someone else might comment on this for us.
>
>       Beyond this, you are to be commended for providing such a simple,
> self-contained example for such a sophisticated question.  I think you
> simulation misses one important point:  It assumes the between-subject
> variance is zero.  To overcome this, I think I might try either the
> bootstrap or permutation distribution scrambling the assignment of
> subjects to treatment groups but otherwise keeping the pairs of
> observations together.
>
>       To this end, consider the following:
>
> # Build a table to translate subject into 'pred'
> o <- with(epil3, order(subject, y))
> epil3. <- epil3[o,]
> norep <- with(epil3., subject[-1]!=subject[-dim(epil3)])
> subj1 <- which(c(T, norep))
> subj.pred <- epil3.[subj1, c("subject", "pred")]
> subj. <- as.character(subj.pred\$subject)
> pred. <- subj.pred\$pred
> names(pred.) <- subj.
>
> iter <- 10
> chisq.sim <- rep(NA, iter)
>
> set.seed(1)
> for(i in 1:iter){
> ## Parameteric version
s.i <- sample(subj.)
> # Randomize subject assignments to 'pred' groups
>   epil3.\$pred <- pred.[s.i][epil3.\$subject]
>   fit1 <- lmer(y ~ pred+(1 | subject),
>                 family = poisson, data = epil3.)
>   fit0 <- lmer(y ~ 1+(1 | subject),
>                 family = poisson, data = epil3.)
>   chisq.sim[i] <- anova(fit0, fit1)[2, "Chisq"]
> }
>
>       Hope this helps.
>       Spencer Graves
>
> Manuel Morales wrote:
> > Dear list,
> >
> > This is more of a stats question than an R question per se. First, I
> > realize there has been a lot of discussion about the problems with
> > estimating P-values from F-ratios for mixed-effects models in lme4.
> > Using mcmcsamp() seems like a great alternative for evaluating the
> > significance of individual coefficients, but not for groups of
> > coefficients as might occur in an experimental design with 3 treatment
> > levels. I'm wondering if the simulation approach I use below to estimate
> > the P-value for a 3-level factor is appropriate, or if there are any
> > suggestions on how else to approach this problem. The model and data in
> > the example are from section 10.4 of MASS.
> >
> > Thanks!
> > Manuel
> >
> > # Load req. package (see functions to generate data at end of script)
> > library(lme4)
> > library(MASS)
> >
> > # Full and reduced models - pred is a factor with 3 levels
> > result.full <- lmer(y~pred+(1|subject), data=epil3, family="poisson")
> > result.base <- lmer(y~1+(1|subject), data=epil3, family="poisson")
> >
> > # Naive P-value from LR for significance of "pred" factor
> > anova(result.base,result.full)\$"Pr(>Chisq)"[] # P-value
> > (test.stat <- anova(result.base,result.full)\$Chisq[]) # Chisq-stat

<snip> Wrong approach here</snip>

> > # Script to generate data, from section 10.4 of MASS
> > epil2 <- epil[epil\$period == 1, ]
> > epil2["period"] <- rep(0, 59); epil2["y"] <- epil2["base"]
> > epil["time"] <- 1; epil2["time"] <- 4
> > epil2 <- rbind(epil, epil2)
> > epil2\$pred <- unclass(epil2\$trt) * (epil2\$period > 0)
> > epil2\$subject <- factor(epil2\$subject)
> > epil3 <- aggregate(epil2, list(epil2\$subject, epil2\$period > 0),
> >                    function(x) if(is.numeric(x)) sum(x) else x)
> > epil3\$pred <- factor(epil3\$pred, labels = c("base", "placebo", "drug"))
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help