[R] logistic mixed effects models with lmer

Fri Dec 28 16:35:37 CET 2007

I have a question about some strange results I get when using lmer to
build a logistic mixed effects model.  I have a data set of about 30k
points, and I'm trying to do backwards selection to reduce the number
of fixed effects in my model.  I've got 3 crossed random effects and
about 20 or so fixed effects.  At a certain point, I get a model (m17)
where the fixed effects are like this (full output is at end of
message):

> print(m17, corr=F)
...
Fixed effects:
                       Estimate Std. Error z value Pr(>|z|)
(Intercept)            -1.97887    0.19699 -10.045  < 2e-16 ***
sexM                    0.45553    0.14387   3.166 0.001544 **
...
is_discTRUE             0.24676    0.15204   1.623 0.104576
poly(wfreq, 2)1      -119.72397   11.00516 -10.879  < 2e-16 ***
poly(wfreq, 2)2        17.35646    5.44456   3.188 0.001433 **
poly(wlen_p, 2)1      -13.60798    7.26926  -1.872 0.061208 .
poly(wlen_p, 2)2       -6.43167    5.24119  -1.227 0.219770
...

where poly(wlen_p,2)2 is the least significant factor left in the
model.  So I then build a model (m18) with exactly the same random and
fixed effects except removing poly(wlen_p,2)2.  Then I do an anova,
and I get:

> anova(m17,m18)
Data:
Models:
m18: is_err ~ sex + starts_turn + before_hes + after_hes + before_part +
m17:     after_part + first_rep + is_open + is_disc + poly(wfreq,
m18:     2) + wlen_p + poly(utt_rate, 2) + poly(dur, 2) + pmean +
m17:     poly(log_prange, 2) + poly(imean, 2) + poly(irange, 2) +
m18:     (1 | speaker) + (1 | corpus) + (1 | ref)
m17: is_err ~ sex + starts_turn + before_hes + after_hes + before_part +
m18:     after_part + first_rep + is_open + is_disc + poly(wfreq,
m17:     2) + poly(wlen_p, 2) + poly(utt_rate, 2) + poly(dur, 2) +
m18:     pmean + poly(log_prange, 2) + poly(imean, 2) + poly(irange,
m17:     2) + (1 | speaker) + (1 | corpus) + (1 | ref)
    Df    AIC    BIC logLik  Chisq Chi Df Pr(>Chisq)
m18 27  25928  26153 -12937
m17 28  25925  26159 -12934 5.2136      1    0.02241 *

So my first question is: Should I be concerned that the significance
level shown in the original m17 is so different from the one shown by
the anova?  It's hard for me to see how this could happen.  I noticed
that there is a post on the FAQ about significance levels in linear
mixed models, but I'm not sure whether it applies to the logistic
case and if so how.

Now, my second question is a result of removing one more factor
(is_disc) from the model, creating m19.  I do another anova:

> anova(m19,m18)
Data:
Models:
m19: is_err ~ sex + starts_turn + before_hes + after_hes + before_part +
m18:     after_part + first_rep + is_open + poly(wfreq, 2) + wlen_p +
m19:     poly(utt_rate, 2) + poly(dur, 2) + pmean + poly(log_prange,
m18:     2) + poly(imean, 2) + poly(irange, 2) + (1 | speaker) + (1 |
m19:     corpus) + (1 | ref)
m18: is_err ~ sex + starts_turn + before_hes + after_hes + before_part +
m19:     after_part + first_rep + is_open + is_disc + poly(wfreq,
m18:     2) + wlen_p + poly(utt_rate, 2) + poly(dur, 2) + pmean +
m19:     poly(log_prange, 2) + poly(imean, 2) + poly(irange, 2) +
m18:     (1 | speaker) + (1 | corpus) + (1 | ref)
    Df    AIC    BIC logLik Chisq Chi Df Pr(>Chisq)
m19 26  25925  26142 -12936
m18 27  25928  26153 -12937     0      1          1

and now it seems that m19 (which contains fewer parameters than m18)
is a better fit.  I don't see how it's possible to remove parameters
from a model and get a better likelihood, but I will confess that I
don't entirely understand how these kinds of models are estimated.
Does this have something to do with approximations that R is making to
fit the models, or numerical rounding errors?  Could either problem be
due to correlations among variables?  All my fixed effects are either
binary or numeric, and there are some fairly high correlations between
a few pairs of them (maybe as high as .6 or .65 using Kendall tau) but I
figured that this would be ok given the large number of data points.
I think each value of each binary feature is observed at least
70-100 times.

In case it matters, I'm running R 2.5.1 on Linux, with lme4 0.99875-8, and
below is a full printout of all my output:

> m17 <- lmer(is_err ~ sex + starts_turn + before_hes + after_hes + before_part + after_part + first_rep + is_open + is_disc + poly(wfreq,2) + poly(wlen_p,2) + poly(utt_rate,2) + poly(dur,2) + pmean + poly(log_prange,2) + poly(imean,2) + poly(irange,2)+(1|speaker)+(1|corpus)+(1|ref), x=T,y=T,family="binomial")

> print(m17,corr=F)
Generalized linear mixed model fit using Laplace
Formula: is_err ~ sex + starts_turn + before_hes + after_hes +
before_part +      after_part + first_rep + is_open + is_disc +
poly(wfreq,      2) + poly(wlen_p, 2) + poly(utt_rate, 2) + poly(dur,
2) +      pmean + poly(log_prange, 2) + poly(imean, 2) + poly(irange,
    2) + (1 | speaker) + (1 | corpus) + (1 | ref)
 Family: binomial(logit link)
   AIC   BIC logLik deviance
 25925 26159 -12934    25869
Random effects:
 Groups  Name        Variance Std.Dev.
 ref     (Intercept) 0.434599 0.65924
 speaker (Intercept) 0.327813 0.57255
 corpus  (Intercept) 0.039722 0.19930
number of obs: 31017, groups: ref, 2601; speaker, 72; corpus, 2

Estimated scale (compare to  1 )  0.9810366

Fixed effects:
                       Estimate Std. Error z value Pr(>|z|)
(Intercept)            -1.97887    0.19699 -10.045  < 2e-16 ***
sexM                    0.45553    0.14387   3.166 0.001544 **
starts_turnTRUE         0.25426    0.08087   3.144 0.001666 **
before_hesTRUE          0.50943    0.12364   4.120 3.79e-05 ***
after_hesTRUE           0.26439    0.12162   2.174 0.029712 *
before_partTRUE         1.07972    0.10748  10.046  < 2e-16 ***
after_partTRUE          0.47089    0.12363   3.809 0.000140 ***
first_repTRUE           1.00673    0.13073   7.701 1.35e-14 ***
is_openTRUE            -0.42213    0.09894  -4.267 1.98e-05 ***
is_discTRUE             0.24676    0.15204   1.623 0.104576
poly(wfreq, 2)1      -119.72397   11.00516 -10.879  < 2e-16 ***
poly(wfreq, 2)2        17.35646    5.44456   3.188 0.001433 **
poly(wlen_p, 2)1      -13.60798    7.26926  -1.872 0.061208 .
poly(wlen_p, 2)2       -6.43167    5.24119  -1.227 0.219770
poly(utt_rate, 2)1      5.21605    3.56227   1.464 0.143126
poly(utt_rate, 2)2     22.75593    3.04234   7.480 7.45e-14 ***
poly(dur, 2)1        -110.73779    6.15641 -17.987  < 2e-16 ***
poly(dur, 2)2          38.71283    3.52495  10.983  < 2e-16 ***
pmean                   1.00184    0.17912   5.593 2.23e-08 ***
poly(log_prange, 2)1    3.71200    3.76913   0.985 0.324702
poly(log_prange, 2)2   17.07179    2.79954   6.098 1.07e-09 ***
poly(imean, 2)1       -21.73475    4.75449  -4.571 4.84e-06 ***
poly(imean, 2)2        28.22247    3.32294   8.493  < 2e-16 ***
poly(irange, 2)1       -6.47276    3.95778  -1.635 0.101955
poly(irange, 2)2        6.99340    3.11937   2.242 0.024966 *

#remove wlen^2
> m18 <- lmer(is_err ~ sex + starts_turn + before_hes + after_hes + before_part + after_part + first_rep + is_open + is_disc + poly(wfreq,2) + wlen_p + poly(utt_rate,2) + poly(dur,2) + pmean + poly(log_prange,2) + poly(imean,2) + poly(irange,2)+(1|speaker)+(1|corpus)+(1|ref), x=T,y=T,family="binomial")

> anova(m17,m18)
Data:
Models:
m18: is_err ~ sex + starts_turn + before_hes + after_hes + before_part +
m17:     after_part + first_rep + is_open + is_disc + poly(wfreq,
m18:     2) + wlen_p + poly(utt_rate, 2) + poly(dur, 2) + pmean +
m17:     poly(log_prange, 2) + poly(imean, 2) + poly(irange, 2) +
m18:     (1 | speaker) + (1 | corpus) + (1 | ref)
m17: is_err ~ sex + starts_turn + before_hes + after_hes + before_part +
m18:     after_part + first_rep + is_open + is_disc + poly(wfreq,
m17:     2) + poly(wlen_p, 2) + poly(utt_rate, 2) + poly(dur, 2) +
m18:     pmean + poly(log_prange, 2) + poly(imean, 2) + poly(irange,
m17:     2) + (1 | speaker) + (1 | corpus) + (1 | ref)
    Df    AIC    BIC logLik  Chisq Chi Df Pr(>Chisq)
m18 27  25928  26153 -12937
m17 28  25925  26159 -12934 5.2136      1    0.02241 *

#remove is_disc
> m19 <- lmer(is_err ~ sex + starts_turn + before_hes + after_hes + before_part + after_part + first_rep + is_open + poly(wfreq,2) + wlen_p + poly(utt_rate,2) + poly(dur,2) + pmean + poly(log_prange,2) + poly(imean,2) + poly(irange,2)+(1|speaker)+(1|corpus)+(1|ref), x=T,y=T,family="binomial")

> anova(m19,m18)
Data:
Models:
m19: is_err ~ sex + starts_turn + before_hes + after_hes + before_part +
m18:     after_part + first_rep + is_open + poly(wfreq, 2) + wlen_p +
m19:     poly(utt_rate, 2) + poly(dur, 2) + pmean + poly(log_prange,
m18:     2) + poly(imean, 2) + poly(irange, 2) + (1 | speaker) + (1 |
m19:     corpus) + (1 | ref)
m18: is_err ~ sex + starts_turn + before_hes + after_hes + before_part +
m19:     after_part + first_rep + is_open + is_disc + poly(wfreq,
m18:     2) + wlen_p + poly(utt_rate, 2) + poly(dur, 2) + pmean +
m19:     poly(log_prange, 2) + poly(imean, 2) + poly(irange, 2) +
m18:     (1 | speaker) + (1 | corpus) + (1 | ref)
    Df    AIC    BIC logLik Chisq Chi Df Pr(>Chisq)
m19 26  25925  26142 -12936
m18 27  25928  26153 -12937     0      1          1

--AuH2O

Sharon Goldwater
Postdoctoral Scholar
Department of Linguistics
Stanford Universtiy