[R] Interpretation of hypothesis tests for mixed models

Mon Dec 16 21:23:03 CET 2002

Sorry if my questions and statements were not clear enough. It is of
course true that the F- and t-tests from fm2 would always be more
conservative than those from fm1, your example for the Machines data
make this point. My (tentative) suggestion was that tests from fm2 are
closer to realising the nominal level of a test. This was based on
running the following code:

library(nlme)
k <- 2            # num subj
n <- 10           # subj/treatm sample size
m1 <- 1.0         # value of fixed effect of treatm 1
m2 <- m1          # same for treatm 2
s.sbj <- 0.5      # SD subj random effect
s.sbj.tr <- 1.0   # SD subj by treatm interaction
s.res <- 0.1      # Residual SD
Subj <- factor( rep(1:k, each=2*n) )
Trt <- factor( rep( rep(c("Tr1","Tr2"), k), each=n) )
m <- rep( rep(c(m1, m2), k), each=n) # fixed effects

n.sims <- 1000
p.f1 <- rep(0, n.sims)
p.f2 <- rep(0, n.sims)
for (i in 1:n.sims) {
  # first get subject/treatment random deviations
  b <- rep( rep(rnorm(k, 0, s.sbj), each=2) + 
           rnorm(2*k, 0, s.sbj.tr), each=n )
  # then get response
  y <- m + b + rnorm(2*k*n, 0, s.res)
  dat <- data.frame(Subj, Trt, y)
  fm1 <- lme(y ~ Trt, data = dat,
             random = list(Subj = pdCompSymm(~ Trt - 1)))
  p.f1[i] <- anova(fm1)$"p-value"[2]
  fm2 <- lme(y ~ Trt, data = dat,
             random = ~ 1 | Subj/Trt)
  p.f2[i] <- anova(fm2)$"p-value"[2]
}

print( sum(p.f1<0.05)/n.sims ) # should be about 0.05
print( sum(p.f2<0.05)/n.sims ) # should be about 0.05
print( sum(p.f1<0.01)/n.sims ) # should be about 0.01
print( sum(p.f2<0.01)/n.sims ) # should be about 0.01

rm(k,n,m1,m2,s.sbj,s.sbj.tr,s.res,Subj,Trt,m)
rm(i,b,y,dat,fm1,fm2)
#rm(n.sims,p.f1,p.f2)

Sorry if the code seems pedestrian. I am not an experienced S
programmer, but I hope I generated data in agreement with the
assumptions of the mixed models I am fitting. As an example of results,
I got
[1] 0.3
[1] 0.053
[1] 0.229
[1] 0.002
suggesting that fm2 comes much closer to the nominal significance
levels. My simulated data are of course a bit extreme (only two
subjects, etc) just to illustrate how much the two tests can differ. 

Coming back to my previous questions, I worry that models like 

 lme(y ~ x, random = ~ x | Subj) 

have a similar problem as fm1 in being too liberal (because denDF in a
test for an effect of x will be large in the same way as for fm1). Am I
wrong in worrying about this?  

-- 
Olof Leimar, Professor
Department of Zoology
Stockholm University
SE-106 91 Stockholm
Sweden

Olof.Leimar at zoologi.su.se