[R] Specification decisions in glm and lmer

Wed Mar 1 03:09:18 CET 2006

I have been reviewing GLM and LMER to sharpen up some course notes and
would like to ask for your advice.

1. Is there a test that would be used to check whether a particular
functional form--say Gaussian, Gamma, or Inverse Gaussian, is "more
appropriate" in a Generalized Linear Model?  A theoretical reason to
choose one over the other is enough for me, but I've got some
skeptical students who want a "significance test."  I've done some
simulation tests and find that the AIC is smaller if you guess the
correct distribution, but as long as you have the link and the
functional form of the right hand side correct, then your parameter
estimates are not horribly affected by the choice of the distribution.
 If your data is Gamma, for example, the estimates from GLM-Normal are
just about as good.  Do you think so?

A draft of my notes is on my new web site:

http://pj.freefaculty.org/ps707/GLM/GammaGLM-01.pdf

If you look down to sections 7 and 8, (or the graphs on p. 9 and p.
17), you might see what I mean.  I still have work to do on this and
if you have pointers, please email me.

2. Suppose you fix a random effects model with lmer and you wonder if
the fit is "better" (I suppose in the sense of anova) than a glm that
is the same, except for the lack of a random effect.

Is there a test for that?  I've been reading the thread in the r-help
list from December, 2005, in which people are asking for a standard
error or confidence interval for a variance component and the answer
seems to be that such a thing is not statistically meaningful.

Would you consider an example using survey data about attitudes toward
the police in US cities?  The variable PLACE indicates which city
people live in. We wonder if there should be a random intercept to
represent diversity across cities.  I want something that works like
anova() might.  What to do?

> gl1 <- glm ( RE2TRCOP~ AGE +  POLINT + HAPPY + EFFCOM + TRNEI +GRPETH + TRBLK + BLKCHIEF + RCOPDIFF+ RCOPRATE + RCOPBKIL + RCRIM3YR + RPCTBLAK, family=binomial(link=logit),data=eldat)
> summary(gl1)

Call:
glm(formula = RE2TRCOP ~ AGE + POLINT + HAPPY + EFFCOM + TRNEI +
    GRPETH + TRBLK + BLKCHIEF + RCOPDIFF + RCOPRATE + RCOPBKIL +
    RCRIM3YR + RPCTBLAK, family = binomial(link = logit), data = eldat)

Deviance Residuals:
    Min       1Q   Median       3Q      Max
-1.5129  -0.5515  -0.4072  -0.2807   2.7975

Coefficients:
             Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.549801   0.666762  -2.324 0.020106 *
AGE         -0.020215   0.005423  -3.728 0.000193 ***
POLINT      -0.148035   0.075748  -1.954 0.050666 .
HAPPY       -0.309656   0.114871  -2.696 0.007024 **
EFFCOM      -0.150475   0.088664  -1.697 0.089670 .
TRNEI        0.376409   0.091701   4.105 4.05e-05 ***
GRPETH       0.593949   0.205169   2.895 0.003792 **
TRBLK        0.575473   0.114896   5.009 5.48e-07 ***
BLKCHIEF    -0.233197   0.188521  -1.237 0.216093
RCOPDIFF     0.051088   0.150306   0.340 0.733938
RCOPRATE     0.169789   0.097540   1.741 0.081733 .
RCOPBKIL     0.147662   0.089269   1.654 0.098102 .
RCRIM3YR     0.189701   0.097590   1.944 0.051913 .
RPCTBLAK    -0.504626   0.175897  -2.869 0.004119 **
---
Signif. codes:  0 $-1òø***òù 0.001 òø**òù 0.01 òø*òù 0.05 òø.òù 0.1 òø òù 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1364.4  on 1694  degrees of freedom
Residual deviance: 1197.5  on 1681  degrees of freedom
AIC: 1225.5

Number of Fisher Scoring iterations: 5

me1 <- lmer( RE2TRCOP~ AGE +  POLINT + HAPPY + EFFCOM + TRNEI +GRPETH
+ TRBLK + BLKCHIEF + RCOPDIFF+ RCOPRATE + RCOPBKIL + RCRIM3YR +
RPCTBLAK + (1 | PLACE) , family=binomial(link=logit),data=eldat,
method="Laplace")
Loading required package: lattice
> summary(me1)
> Generalized linear mixed model fit using Laplace
Formula: RE2TRCOP ~ AGE + POLINT + HAPPY + EFFCOM + TRNEI + GRPETH +
TRBLK +      BLKCHIEF + RCOPDIFF + RCOPRATE + RCOPBKIL + RCRIM3YR +
RPCTBLAK +      (1 | PLACE)
   Data: eldat
 Family: binomial(logit link)
      AIC      BIC    logLik deviance
 1227.446 1308.977 -598.7229 1197.446
Random effects:
     Groups        Name    Variance    Std.Dev.
      PLACE (Intercept)   0.0056343    0.075062
# of obs: 1695, groups: PLACE, 18

Estimated scale (compare to 1)  1.014432

Fixed effects:
              Estimate Std. Error z value  Pr(>|z|)
(Intercept) -1.5478617  0.6708198 -2.3074 0.0210315 *
AGE         -0.0202473  0.0054271 -3.7308 0.0001909 ***
POLINT      -0.1481644  0.0757952 -1.9548 0.0506069 .
HAPPY       -0.3102757  0.1149718 -2.6987 0.0069609 **
EFFCOM      -0.1501713  0.0887251 -1.6925 0.0905419 .
TRNEI        0.3757690  0.0918107  4.0929 4.261e-05 ***
GRPETH       0.5903173  0.2053184  2.8751 0.0040386 **
TRBLK        0.5754894  0.1149954  5.0045 5.602e-07 ***
BLKCHIEF    -0.2275911  0.1935195 -1.1761 0.2395698
RCOPDIFF     0.0483413  0.1541093  0.3137 0.7537626
RCOPRATE     0.1681531  0.1003416  1.6758 0.0937760 .
RCOPBKIL     0.1437347  0.0924614  1.5545 0.1200562
RCRIM3YR     0.1825188  0.1013626  1.8007 0.0717576 .
RPCTBLAK    -0.4959112  0.1814734 -2.7327 0.0062819 **
---

> anova(gl1,me1)
Analysis of Deviance Table

Model: binomial, link: logit

Response: RE2TRCOP

Terms added sequentially (first to last)

           Df Deviance Resid. Df Resid. Dev
NULL                        1694    1364.45
AGE         1    26.26      1693    1338.19
POLINT      1    14.18      1692    1324.02
HAPPY       1    25.80      1691    1298.21
EFFCOM      1     5.08      1690    1293.13
TRNEI       1    37.65      1689    1255.49
GRPETH      1     8.02      1688    1247.46
TRBLK       1    26.87      1687    1220.59
BLKCHIEF    1     1.36      1686    1219.23
RCOPDIFF    1    10.29      1685    1208.94
RCOPRATE    1     0.35      1684    1208.59
RCOPBKIL    1     0.71      1683    1207.88
RCRIM3YR    1     1.77      1682    1206.12
RPCTBLAK    1     8.63      1681    1197.48
>

Are there tests that can help you decide if the distribution of the
dependent variable is Gamma or inverse Gaussian, or Gaussian for that
matter?  Is one supposed to use the AIC from the glm estimates to
decide which is best?  The deviance is not comparable across families,
right?

--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas