[R] Possible bug in lmer nested analysis with factors

Sun Sep 18 17:04:40 CEST 2005

On 9/18/05, Yan Wong <h.y.wong at leeds.ac.uk> wrote:
> 
> On 16 Sep 2005, at 17:21, Sundar Dorai-Raj wrote:
> 
> > My guess is he wants this:
> >
> > c1 <- factor(c)
> > d1 <- factor(d)
> > m <- lmer(a ~ b + (1|c1:d1)+(1|c1))
> >
> > which assumes d1 is nested within c1.
> >
> > Take a look at Section 3 in the "MlmSoftRev" vignette:
> >
> > library(mlmRev)
> > vignette("MlmSoftRev")
> 
> Ah, that vignette is extremely useful: it deserves to be more widely
> known.
> I mainly intended this reply to be a thank you to yourself and Harold.
> 
> Unfortunately, there is (as always), one last thing that is still
> puzzling me:
> the df for fixed factors seems to vary between what I currently
> understand to
> be equivalent calls to lme and lmer, e.g:
> 
> -------
> 
> a<-rnorm(36);
> b<-factor(rep(1:3,each=12))
> c<-factor(rep(1:2,each=6,3))
> d<-factor(rep(1:3,each=2,6))
> c <- evalq(b:c)[,drop=T] #make unique factor levels
> d <- evalq(c:d)[,drop=T] #make unique factor levels
> 
> summary(lme(a ~ b, random=~1|c/d))
> #  output includes estimates for fixed effects such as
> #                    Value Std.Error DF    t-value p-value
> #  (Intercept)  0.06908901 0.3318330 18  0.2082041  0.8374
> #  b2           0.13279084 0.4692828  3  0.2829655  0.7956
> #  b3          -0.26146698 0.4692828  3 -0.5571630  0.6163
> 
> # I understand the above lme model to be equivalent to
> summary(lmer(a ~ b + (1|c) +(1|c:d))
> #but this gives fixed effects estimates with differing DF:
> #               Estimate Std. Error DF t value Pr(>|t|)
> #  (Intercept)  0.069089   0.331724 33  0.2083   0.8363
> #  b2           0.132791   0.469128 33  0.2831   0.7789
> #  b3          -0.261467   0.469128 33 -0.5573   0.5811
> 
> Again, many thanks for your help: even more so if you or anyone
> else can answer this last little niggle of mine.

I'm coming to the discussion late and would also like to thank Sundar
and Harold for their explanations.

You are correct that good documentation of the capabilities of lmer
does not currently exist. lmer is still under active development and
documentation is spread in several places.  The vignette in the mlmRev
package explores some of the capabilities of lmer.  Also see the
examples in that package.

You are correct that the denominator degrees of freedom associated
with terms in the fixed effects is different between lme and lmer. 
Neither of them is "right" because there is no correct answer but the
values from lmer are definitely more wrong than the values from lme.
The problem is that lmer allows a wider range of models than does lme.
 In lme the grouping factors can only be nested.  You can fake crossed
grouping factors but you do need to fake them.  Lmer allows nested or
crossed or partially crossed grouping factors.  Most of the subtlety
in the design of lmer is to handle the case of partially crossed
grouping factors in large data sets (think of value-added models that
are applied to longitudinal data on students crossed with teachers in
schools within school districts within states ...).  Some arguments on
degrees of freedom can be made for nested grouping factors but the
question of testing fixed effects terms for models with partially
crossed grouping factors is difficult.  I am aware of the 1997
Biometrics paper by Kenward and Roger but I find it difficult to
translate their formulae into something I can evaluate.  Their
representation of a linear mixed model is as a generalized least
squares problem whereas lmer uses a penalized least squares
representation.  These are equivalent but sometimes the translations
back and forth are difficult to write down.

This area could be a very fruitful research area for people with
strong mathematical and implementation skills.  I have a partially
completed writeup on the details of the lmer representation and
implementation (the description of the linear mixed model is done -
I'm still working on the description of the generalized linear mixed
model and the nonlinear mixed model) which I could forward to anyone
interested in such a challenge.  There are two or three approaches
that could be used and I can provide some references.  An extensive
comparison of the properties of these approaches across a range of
real problems would be a valuable contribution to the literature but
it would involve a lot of work in implementation.

There are already some facilities for lmer models such as mcmcsamp and
simulate which can be used for evaluating the posterior distribution
of a single coefficient or for a parametric bootstrap of the reference
distribution of a quantity like the likelihood ratio statistic for a
hypothesis test.