[R] question about glm behavior

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue Nov 13 21:31:19 CET 2007


On Tue, 13 Nov 2007, Yip wrote:

> I was trying a glm fitting (as shown below) and I got a warning and a 
> fitted residual deviance larger than the null deviance. Is this the 
> expected behavor of glm?

It can be.  One of two things is going on here:

A) There is complete separation between the classes: there is a linear 
combination of the explanatory variables than is negative on class 0 and 
positive on class 1.  So there is no (finite) MLE and the best fit has 
probability one for all cases.

B) The fit is extreme, with most cases with fitted probabilities 
indistinguishable from one, but with some that are not.

I think it is B) in your case.

> I would expect that even though the warning might be warranted I should 
> not get worse fitting with an additional covariate in the model. Could 
> anyone tell me what I'm missing?

That the fit has not converged (you were warned).  Trying trace:

> glm(f~x1+x2+g, family=binomial(link="logit"), trace=TRUE)
Deviance = 33.11507 Iterations - 1
Deviance = 24.94139 Iterations - 2
Deviance = 22.43989 Iterations - 3
Deviance = 21.39709 Iterations - 4
Deviance = 20.84556 Iterations - 5
Deviance = 20.35107 Iterations - 6
Deviance = 19.58265 Iterations - 7
Deviance = 18.79010 Iterations - 8
Deviance = 18.54549 Iterations - 9
Deviance = 17.93380 Iterations - 10
Deviance = 50.95846 Iterations - 11
Deviance = 4123.945 Iterations - 12
....

so the algorithm ran into numerical difficulties after 10 steps (and was 
in any case converging slowly).  What has happened is that 40 of the 
cases have fitted probabilities so near one that they cannot be 
represented accurately.  This is inevitable in the current formulation:

> binomial()$variance
function (mu)
mu * (1 - mu)
...

and 1-mu cannot be computed accurately, hence the warning.

There are numerically stable ways to do this calculation, but not I think 
within the glm() framework.


> I get the same results in both R2.5.1 on windows and 2.5.0 on
> amd64-pc-linux.
>
> Thank you!
>
> --Yiping
>
>> x1
> [1] -4.320069e-02 -4.574697e-02 -4.091768e-02 -4.405891e-02  1.723491e-02
> [6] -4.534468e-02 -3.514106e-02 -4.115642e-02 -4.384857e-02 -4.098890e-02
> [11] -4.423853e-02 -4.320503e-02 -3.949471e-02 -4.438473e-02 -4.328487e-02
> [16] -4.100160e-02 -4.238224e-02 -3.961957e-02 -4.343796e-02 -4.375786e-02
> [21] -4.291601e-02 -4.363903e-02 -3.944323e-02 -4.492861e-02 -4.314135e-02
> [26] -4.407560e-02 -4.262769e-02 -4.446548e-02 -4.332781e-02 -4.330175e-02
> [31] -4.533415e-02  3.365331e-01 -4.394758e-02 -4.291074e-02 -4.400895e-02
> [36] -4.230687e-02 -4.389724e-02  3.303411e-01 -3.907463e-02  3.147079e-01
> [41] -4.174071e-02  2.152082e-01 -4.189925e-02 -4.189424e-02 -3.934256e-02
> [46]  3.075004e-01 -4.372070e-02 -4.253140e-02 -3.940970e-02 -4.363639e-02
> [51] -4.294994e-02 -2.925661e-02 -2.721621e-02 -4.225099e-02  2.083608e-01
> [56] -4.356620e-02 -4.142498e-02 -4.303415e-02 -4.297786e-02  5.919961e-05
> [61] -4.367790e-02 -4.370738e-02 -4.025630e-02 -3.830009e-02  3.117086e-01
> [66]  3.290152e-01 -2.538811e-02 -4.468090e-02  2.784418e-01  3.511454e-02
> [71]  2.913742e-01 -4.387962e-02 -4.268680e-02 -4.171391e-02 -4.202010e-02
> [76] -4.322166e-02 -3.706578e-02 -4.390152e-02 -3.587444e-02 -4.384691e-02
> [81] -1.480347e-03 -4.389834e-02 -4.185581e-02 -6.265638e-03 -4.202021e-02
> [86] -1.808579e-03 -4.264013e-02
>> x2
> [1] -0.030178811 -0.030383785 -0.024195005 -0.015303415  0.505774406
> [6] -0.023545322 -0.026341561 -0.031700423 -0.033170361 -0.016807687
> [11] -0.030765392 -0.032847977 -0.022241150 -0.030362745 -0.034153581
> [16] -0.034486433 -0.026723467 -0.024891104 -0.029309960 -0.028852347
> [21] -0.031251711 -0.025874702 -0.025032587 -0.035013137 -0.019088676
> [26] -0.021198249 -0.029941158 -0.026721874 -0.023171718 -0.032476985
> [31] -0.028476595 -0.048960041 -0.029837101 -0.017454529 -0.026821415
> [36] -0.027268238 -0.023836780 -0.065271644 -0.034818157 -0.032746126
> [41] -0.025742516 -0.057888806 -0.029212633 -0.022687920 -0.027572481
> [46]  0.027638085 -0.030762472 -0.024144720 -0.027526008 -0.025081009
> [51] -0.026338116 -0.017626651  0.001067303 -0.028009695 -0.039969065
> [56] -0.025602297 -0.027279180 -0.015731185 -0.023141642  0.333521248
> [61] -0.025868744 -0.021978049 -0.022180398 -0.010942728  0.021911932
> [66] -0.103832477 -0.017300324 -0.024064572 -0.036209915  0.602510894
> [71] -0.023094473 -0.024840084 -0.035736611 -0.027606170 -0.019964185
> [76] -0.029183955 -0.023701532 -0.027828418 -0.014666741 -0.024613236
> [81]  0.284105237 -0.026196040 -0.029447691  0.280181037 -0.027221380
> [86]  0.184637507 -0.039031585
>> g
> [1] 2 1 1 2 2 2 1 2 2 2 2 2 2 2 2 1 2 2 2 1 1 1 1 2 2 2 2 2 2 1 1 1 1 0 1 2
> 1 2
> [39] 1 1 2 1 2 1 1 1 0 1 2 1 1 2 2 2 2 1 1 1 1 2 1 2 1 2 1 2 2 1 1 2 2 2 2 2
> 2 1
> [77] 1 2 2 2 2 1 1 2 2 1 1
>> f
> [1] 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 0
> 1 0
> [39] 1 1 0 1 0 1 1 1 1 1 0 1 1 0 0 0 0 1 1 1 1 0 1 0 1 0 1 0 0 1 1 0 0 0 0 0
> 0 1
> [77] 1 0 0 0 0 1 1 0 0 1 1
>> table(f,g)
>   g
> f    0  1  2
>  0  0  0 44
>  1  2 38  3
>> glm(f~x1+x2+g, family=binomial(link="logit"), na.action=na.omit)
>
> Call:  glm(formula = f ~ x1 + x2 + g, family = binomial(link = "logit"),
> na.action = na.omit)
>
> Coefficients:
> (Intercept)           x1           x2            g
>  9.184e+15   -4.359e+15    7.889e+14   -6.190e+15
>
> Degrees of Freedom: 86 Total (i.e. Null);  83 Residual
> Null Deviance:      120.6
> Residual Deviance: 216.3        AIC: 224.3
> Warning message:
> fitted probabilities numerically 0 or 1 occurred in: glm.fit(x = X, y = Y,
> weights = weights, start = start, etastart = etastart,
>>
>
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list