[R] glm StepAIC with all interactions and update to remove a term vs. glm specifying all but a few terms and stepAIC

Tue Jan 27 14:57:34 CET 2009

At 12:49 26/01/2009, Robert Michael Inman wrote:
>Problem:
>I am sorting through model selection process for first time and want to make
>sure that I have used glm, stepAIC, and update correctly.  Something is
>strange because I get a different result between:
>
>1) a glm of 12 predictor variables followed by a stepAIC where all
>interactions are considered and then an update to remove one specific
>interaction.
>
>vs.
>
>2) entering all the terms individually in a glm (exept the one that I
>removed with update and 4 others like it but which did not make it to final
>model anyway), and then running stepAIC.

I am not the world's leading expert on this area but nobody else 
seems to have replied yet so here goes:
1 - stepwise methods capitalise on chance features of your dataset 
and so starting from a slightly different place may give different 
results. For instance if you do all possible subsets then the 'best' 
subset of size k is not guaranteed to include the members of the 
'best' subset of size j (j<k) and indeed may not include any of them.
2 - the lack of significance of some predictors in the last model is 
probably because stepAIC respects marginality, certainly MASS implies this.
You might find
@book{miller90,
    author = "Miller, A J",
    title = "Subset selection in regression",
    year = "1990",
    publisher = "Chapman and Hall",
    address = "London",
    keywords= {stepwise}
}
helpful. And MASS of course since the package is support for the book.

>Question:
>Why do these processes not yield same model?
>
>
>
>Here are all the details if helpful:
>I start with 12 potential predictor variables, 7 "primary" terms and 5
>additional that are I(primary_terms^2).  I run a glm for these 12 and then
>do stepAIC (BIC actually) both directions.  The scope argument is
>scope=list(upper=~.^2,lower=NULL).  This means there are 78 predictor terms
>considered, the 12 primary terms and 66 interactions [n(n+1)/2].  I see this
>with trace=T also.  Here is the code used:
>
> >glm1<-glm(formula = PRESENCE == "1" ~ SNOW + I(SNOW^2) + POP_DEN + ROAD_DE
>+ ADJELEV + I(ADJELEV^2) + TRI + I(TRI^2) + EDGE + I(EDGE^2) + TREECOV +
>I(TREECOV^2),family = binomial, data = wolv)
>     summary(glm1)
> >library(MASS)
> >stepglm2<-stepAIC(glm1,scope=list(upper=~.^2,lower=NULL),
>trace=T,k=log(4828),direction="both")
> >    summary(stepglm2)
> >    extractAIC(stepglm2,k=log(4828))
>
>This results in a 15 term model with a BIC of 3758.659
>
> >    Coefficients:
> >                            Estimate Std. Error z value Pr(>|z|)
> >    (Intercept)           -4.983e+01  9.263e+00  -5.379 7.50e-08 ***
> >    SNOW                   6.085e-02  8.641e-03   7.041 1.90e-12 ***
> >    ROAD_DE               -5.637e-01  1.192e-01  -4.730 2.24e-06 ***
> >    ADJELEV                2.880e-02  7.457e-03   3.863 0.000112 ***
> >    I(ADJELEV^2)          -4.038e-06  1.487e-06  -2.715 0.006618 **
> >    TRI                    5.675e-02  1.081e-02   5.248 1.54e-07 ***
> >    I(TRI^2)              -1.713e-03  4.243e-04  -4.036 5.43e-05 ***
> >    EDGE                   6.418e-03  1.697e-03   3.782 0.000156 ***
> >    TREECOV                1.680e-01  2.929e-02   5.735 9.76e-09 ***
> >    SNOW:ADJELEV          -4.313e-05  6.935e-06  -6.219 5.00e-10 ***
> >    ADJELEV:TREECOV       -6.628e-05  1.161e-05  -5.711 1.13e-08 ***
> >    SNOW:I(ADJELEV^2)      7.437e-09  1.384e-09   5.373 7.74e-08 ***
> >    TRI:I(TRI^2)           1.321e-06  3.419e-07   3.863 0.000112 ***
> >    I(ADJELEV^2):I(TRI^2) -2.127e-10  5.745e-11  -3.702 0.000214 ***
> >    ADJELEV:I(TRI^2)       1.029e-06  3.004e-07   3.424 0.000617 ***
> >    SNOW:TRI               1.057e-05  3.372e-06   3.135 0.001721 **
>
>
>
>The final model included a the TRI:I(TRI^2) term, which is effectively a
>cubic function.  So this was removed because cubic's were not considered for
>all variables.  I used update to remove TRI:I(TRI^2).  Code:
>
> >stepglm3<-update(stepglm2,~.-TRI:I(TRI^2),trace=T)
> >    summary(stepglm3)
> >    extractAIC(stepglm3,k=log(4828))
>
>This results in a 14 term model with a BIC of 3770.172.  The BIC is a little
>higher, but the cubic term improved fit and is no longer in, so expected.
>
> >Coefficients:
> >                            Estimate Std. Error z value Pr(>|z|)
> >    (Intercept)           -5.329e+01  9.267e+00  -5.750 8.92e-09 ***
> >    SNOW                   6.241e-02  8.695e-03   7.178 7.06e-13 ***
> >    ROAD_DE               -5.756e-01  1.184e-01  -4.863 1.16e-06 ***
> >    ADJELEV                3.233e-02  7.452e-03   4.338 1.44e-05 ***
> >    I(ADJELEV^2)          -4.724e-06  1.487e-06  -3.177 0.001489 **
> >    TRI                    1.834e-02  5.402e-03   3.395 0.000687 ***
> >    I(TRI^2)              -1.122e-03  3.920e-04  -2.863 0.004190 **
> >    EDGE                   6.344e-03  1.690e-03   3.754 0.000174 ***
> >    TREECOV                1.745e-01  2.923e-02   5.969 2.39e-09 ***
> >    SNOW:ADJELEV          -4.444e-05  6.984e-06  -6.363 1.98e-10 ***
> >    ADJELEV:TREECOV       -6.885e-05  1.160e-05  -5.937 2.90e-09 ***
> >    SNOW:I(ADJELEV^2)      7.681e-09  1.395e-09   5.506 3.67e-08 ***
> >    I(ADJELEV^2):I(TRI^2) -1.839e-10  5.692e-11  -3.232 0.001231 **
> >    ADJELEV:I(TRI^2)       8.860e-07  2.974e-07   2.979 0.002892 **
> >    SNOW:TRI               1.219e-05  3.260e-06   3.740 0.000184 ***
>
>This all seems to be as it should.  I then decided to try and confim this
>result by running a glm without any of the 5 potential cubic terms ( note -
>TRI:I(TRI^2) was the only one that made it into the final model but there
>were 5 potential).  After entering the 73 potential terms (12 primary
>vaiables and now 66 minus 5 interactions = 73 total), the glm and stepAIC
>produces a completely different final model.  It has 8 variables that were
>not in the model that was chosen with scope statement and manually removing
>TRI:TRI^2, and it is missing 7 variables that were in the model chosen with
>the scope statement.  It has 8 variables that were in both.  Code and
>Result:
>
> >glmalt1b<-glm(formula = PRESENCE =="1" ~
>SNOW+SNOW:POP_DEN+SNOW:ROAD_DE+SNOW:ADJELEV+SNOW:I(ADJELEV^2)+SNOW:TRI+SNOW:
>I(TRI^2)+SNOW:EDGE+SNOW:I(EDGE^2)+SNOW:TREECOV+SNOW:I(TREECOV^2)+I(SNOW^2)+I
>(SNOW^2):POP_DEN+
> >
>I(SNOW2):ROAD_DE+I(SNOW^2):ADJELEV+I(SNOW^2):I(ADJELEV^2)+I(SNOW^2):TRI+I(SN
>OW^2):I(TRI^2)+I(SNOW^2):EDGE+I(SNOW^2):I(EDGE^2)+I(SNOW^2):TREECOV+I(SNOW^2
>):I(TREECOV^2)+POP_DEN+POP_DEN:ROAD_DE+
> >
>POP_DEN:ADJELEV+POP_DEN:I(ADJELEV^2)+POP_DEN:TRI+POP_DEN:I(TRI^2)+POP_DEN:ED
>GE+POP_DEN:I(EDGE^2)+POP_DEN:TREECOV+POP_DEN:I(TREECOV^2)+ROAD_DE+ROAD_DE:AD
>JELEV+ROAD_DE:I(ADJELEV^2)+ROAD_DE:TRI+
> >
>ROAD_DE:I(TRI^2)+ROAD_DE:EDGE+ROAD_DE:I(EDGE^2)+ROAD_DE:TREECOV+ROAD_DE:I(TR
>EECOV^2)+ADJELEV+ADJELEV:TRI+ADJELEV:I(TRI^2)+ADJELEV:EDGE+ADJELEV:I(EDGE^2)
>+ADJELEV:TREECOV+ADJELEV:I(TREECOV^2)+I(ADJELEV^2)+
> >
>I(ADJELEV^2):TRI+I(ADJELEV^2):I(TRI^2)+I(ADJELEV^2):EDGE+I(ADJELEV^2):I(EDGE
>^2)+I(ADJELEV^2):TREECOV+I(ADJELEV^2):I(TREECOV^2)+TRI+TRI:EDGE+TRI:I(EDGE^2
>)+TRI:TREECOV+TRI:I(TREECOV^2)+I(TRI^2)+
> >
>I(TRI^2):EDGE+I(TRI^2):I(EDGE^2)+I(TRI^2):TREECOV+I(TRI^2):I(TREECOV^2)+EDGE
>+EDGE:TREECOV+EDGE:I(TREECOV^2)+I(EDGE^2)+I(EDGE^2):TREECOV+I(EDGE^2):I(TREE
>COV^2)+TREECOV+I(TREECOV^2), family=binomial, data=wolv)
> >            summary(glmalt1b)
> >            stepglmalt2b<-stepAIC(glmalt1b, trace=T, k=log(4828),
>direction="both")  #BIC
> >            summary(stepglmalt2b)
> >            extractAIC(stepglmalt2b,k=log(4828))
> >
> >Coefficients:
> >                                  Estimate Std. Error z value Pr(>|z|)
> >       (Intercept)               -1.995e+01  7.499e+00  -2.660 0.007819 **
> >       SNOW                       1.641e-02  4.881e-03   3.363 0.000772 ***
> >       I(SNOW^2)                  2.238e-05  4.729e-06   4.732 2.22e-06 ***
> >       ROAD_DE                   -5.619e-01  1.187e-01  -4.733 2.21e-06 ***
> >       ADJELEV                    4.361e-03  5.876e-03   0.742 0.457966
> >       I(ADJELEV^2)               1.001e-06  1.165e-06   0.859 0.390257
> >       TRI                       -1.982e-01  6.066e-02  -3.268 0.001083 **
> >       I(TRI^2)                  -6.842e-05  1.868e-05  -3.664 0.000249 ***
> >       I(EDGE^2)                  6.321e-05  2.119e-05   2.983 0.002857 **
> >       I(TREECOV^2)               2.947e-03  4.984e-04   5.912 3.38e-09 ***
> >       SNOW:ADJELEV              -6.244e-06  1.959e-06  -3.187 0.001438 **
> >       SNOW:TRI                   1.018e-05  3.403e-06   2.991 0.002778 **
> >       I(SNOW^2):ADJELEV         -1.852e-08  3.477e-09  -5.326 1.00e-07 ***
> >       I(SNOW^2):I(ADJELEV^2)     3.726e-12  6.771e-13   5.503 3.73e-08 ***
> >       ADJELEV:TRI                1.887e-04  4.895e-05   3.855 0.000116 ***
> >       I(ADJELEV^2):TRI          -4.010e-08  9.697e-09  -4.135 3.55e-05 ***
> >       I(ADJELEV^2):I(TREECOV^2) -4.532e-10  7.727e-11  -5.865 4.48e-09 ***
>
>
>
>If anyone can tell me why this is different I would greatly appreciate it.
>Also, why does this last model include terms that are not significant?
>
>Thanks
>
>Bob

Michael Dewey
http://www.aghmed.fsnet.co.uk