# [R] glm and lrm disagree with zero table cells

Eric Rescorla ekr at rtfm.com
Thu Oct 24 17:52:13 CEST 2002

```I've noticed that glm and lrm give extremely different results if you
attempt to fit a saturated model to a dataset with zero cells. Consider,
for instance the data from, Agresti's Death Penalty example [0].

The crosstab table is:

, , PENALTY = NO

VIC
DEF     BLACK WHITE
BLACK    97    52
WHITE     9   132

, , PENALTY = YES

VIC
DEF     BLACK WHITE
BLACK     6    11
WHITE     0    19

Regression with an unsaturated model produces essentially
the same fit parameters with both glm and lrm. However,
if we try to fit a saturated model....

FITTING WITH GLM:
> summary(glm(PENALTY~DEF*VIC,binomial))

Call:
glm(formula = PENALTY ~ DEF * VIC, family = binomial)

Deviance Residuals:
Min       1Q   Median       3Q      Max
-0.6195  -0.5186  -0.5186  -0.3465   2.3845

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)        -2.7830     0.4207  -6.615 3.71e-11 ***
DEFWHITE           -4.7823     8.8981  -0.537   0.5910
VICWHITE            1.2296     0.5358   2.295   0.0217 *
DEFWHITE:VICWHITE   4.3973     8.9076   0.494   0.6216
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 226.51  on 325  degrees of freedom
Residual deviance: 218.39  on 322  degrees of freedom
AIC: 226.39

Number of Fisher Scoring iterations: 6

FITTING WITH LRM:
> lrm(PENALTY~DEF*VIC)

Logistic Regression Model

lrm(formula = PENALTY ~ DEF * VIC)

Frequencies of Responses
NO YES
290  36

Obs  Max Deriv Model L.R.       d.f.          P          C        Dxy
326      0.002       8.13          3     0.0435      0.624      0.248
Gamma      Tau-a         R2      Brier
0.383      0.049      0.049      0.096

Coef   S.E.    Wald Z P
Intercept             -2.783  0.4207 -6.62  0.0000
DEF=WHITE             -5.490 20.8691 -0.26  0.7925
VIC=WHITE              1.230  0.5358  2.29  0.0217
DEF=WHITE * VIC=WHITE  5.105 20.8732  0.24  0.8068

If we fill in the remaining table cell with a dummy value, [1]
however, then glm and lrm produce essentially the same result.
Here's the glm result.

> summary(glm(PENALTY~DEF*VIC,binomial))

Call:
glm(formula = PENALTY ~ DEF * VIC, family = binomial)

Deviance Residuals:
Min       1Q   Median       3Q      Max
-0.6195  -0.5186  -0.5186  -0.3465   2.3845

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)        -2.7829     0.4192  -6.639 3.15e-11 ***
DEFWHITE            0.5857     1.1343   0.516   0.6056
VICWHITE            1.2296     0.5346   2.300   0.0215 *
DEFWHITE:VICWHITE  -0.9707     1.2070  -0.804   0.4213
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 230.90  on 326  degrees of freedom
Residual deviance: 224.88  on 323  degrees of freedom
AIC: 232.88

Number of Fisher Scoring iterations: 4

So, my question here is: is this normal behavior? If it
is, perhaps someone could speculate on why the results are
different.

Thanks,
-Ekr

[0] Agresti, A., "Categorical Data Analysis", Wiley 1990.
The data set can be found at http://www.rtfm.com/death.txt

[1] http://www.rtfm.com/death-filled-in.txt

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

```