[R] Interpretation of output from glm

Wed Nov 9 16:45:06 CET 2005

Dear John,

Thanks for the quick reply. I did indeed have these ideas, but somehow 
"floating", and all I could find about this mentioned categorical 
predictors. Can you suggest a good book where I could try to learn more 
about this?

Thanks again,

Pedro
At 01:49 09/11/2005, you wrote:
>Dear Pedro,
>
>
> > -----Original Message-----
> > From: r-help-bounces at stat.math.ethz.ch
> > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Pedro de Barros
> > Sent: Tuesday, November 08, 2005 9:47 AM
> > To: r-help at stat.math.ethz.ch
> > Subject: [R] Interpretation of output from glm
> > Importance: High
> >
> > I am fitting a logistic model to binary data. The response
> > variable is a factor (0 or 1) and all predictors are
> > continuous variables. The main predictor is LT (I expect a
> > logistic relation between LT and the probability of being
> > mature) and the other are variables I expect to modify this relation.
> >
> > I want to test if all predictors contribute significantly for
> > the fit or not I fit the full model, and get these results
> >
> >  > summary(HMMaturation.glmfit.Full)
> >
> > Call:
> > glm(formula = Mature ~ LT + CondF + Biom + LT:CondF + LT:Biom,
> >      family = binomial(link = "logit"), data = HMIndSamples)
> >
> > Deviance Residuals:
> >      Min       1Q   Median       3Q      Max
> > -3.0983  -0.7620   0.2540   0.7202   2.0292
> >
> > Coefficients:
> >                Estimate Std. Error z value Pr(>|z|)
> > (Intercept) -8.789e-01  3.694e-01  -2.379  0.01735 *
> > LT           5.372e-02  1.798e-02   2.987  0.00281 **
> > CondF       -6.763e-02  9.296e-03  -7.275 3.46e-13 ***
> > Biom        -1.375e-02  2.005e-03  -6.856 7.07e-12 ***
> > LT:CondF     2.434e-03  3.813e-04   6.383 1.74e-10 ***
> > LT:Biom      7.833e-04  9.614e-05   8.148 3.71e-16 ***
> > ---
> > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> >
> > (Dispersion parameter for binomial family taken to be 1)
> >
> >      Null deviance: 10272.4  on 8224  degrees of freedom
> > Residual deviance:  7185.8  on 8219  degrees of freedom
> > AIC: 7197.8
> >
> > Number of Fisher Scoring iterations: 8
> >
> > However, when I run anova on the fit, I get  >
> > anova(HMMaturation.glmfit.Full, test='Chisq') Analysis of
> > Deviance Table
> >
> > Model: binomial, link: logit
> >
> > Response: Mature
> >
> > Terms added sequentially (first to last)
> >
> >
> >             Df Deviance Resid. Df Resid. Dev P(>|Chi|)
> > NULL                        8224    10272.4
> > LT          1   2873.8      8223     7398.7       0.0
> > CondF       1      0.1      8222     7398.5       0.7
> > Biom        1      0.2      8221     7398.3       0.7
> > LT:CondF    1    142.1      8220     7256.3 9.413e-33
> > LT:Biom     1     70.4      8219     7185.8 4.763e-17
> > Warning message:
> > fitted probabilities numerically 0 or 1 occurred in: method(x
> > = x[, varseq <= i, drop = FALSE], y = object$y, weights =
> > object$prior.weights,
> >
> >
> > I am having a little difficulty interpreting these results.
> > The result from the fit tells me that all predictors are
> > significant, while
> > the anova indicates that besides LT (the main variable), only the
> > interaction of the other terms is significant, but the main
> > effects are not.
> > I believe that in the first output (on the glm object), the
> > significance of
> > all terms is calculated considering each of them alone in the
> > model (i.e.
> > removing all other terms), while the anova output is (as it says)
> > considering the sequential addition of the terms.
> >
> > So, there are 2 questions:
> > a) Can I tell that the interactions are significant, but not
> > the main effects?
>
>In a model with this structure, the "main effects" represent slopes over the
>origin (i.e., where the other variables in the product terms are 0), and
>aren't meaningfully interpreted as main effects. (Is there even any data
>near the origin?)
>
> > b) Is it legitimate to consider a model where the interactions are
> > considered, but not the main effects CondF and Biom?
>
>Generally, no: That is, such a model is interpretable, but it places strange
>constraints on the regression surface -- that the CondF and Biom slopes are
>0 over the origin.
>
>None of this is specific to logistic regression -- it applies generally to
>generalized linear models, including linear models.
>
>I hope this helps,
>  John