[R] StepAIC

Christian Hennig fm3a004 at math.uni-hamburg.de
Mon Mar 29 18:07:37 CEST 2004


Dear list,

here is an example of stepAIC that I do not understand.
The data is n=42, Lage is the only factor and there are four other
variables treated as continuous.

First you see the stepAIC-forward solution (fs7). The strange thing here
is that apparently not all interactions are tried for inclusion, but only 
WQ:Lage. In particular, I think that WFL:Lage should be tried
in the last two steps, where WFL and Lage are already in the fit.
After fs7, I give the output of fs6 (backward), where all interactions are
tried as I have expected. (regsubsets works properly forward and
backward.)

Do I misunderstand something or is something strange going on in the
forward fit?

(I don't want to discuss here if the forward fit is a good thing to do
from a data analytic viewpoint. I agree that I should presumably not
choose it. However, I want to understand what the algorithm does.)

Thank you,
Christian

> w6 <- lm(Preis~RW1+WFL+WQ+VD+Lage+Lage*WFL+Lage*WQ+Lage*VD,
+                  data=wohnung)
> w7 <- lm(Preis~1,                 data=wohnung)

> fs7 <-
stepAIC(w7,scope=list(upper=~RW1+WFL+WQ+VD+Lage+Lage*WFL+Lage*WQ+Lage*VD,
+               lower=~1), direction="forward")
Start:  AIC= 623.57 
 Preis ~ 1 

       Df Sum of Sq       RSS       AIC
+ WQ    1  37219390  75101315       609
+ Lage  1  19029749  93290956       618
+ WFL   1  12506022  99814682       621
+ RW1   1   7299347 105021358       623
<none>              112320704       624
+ VD    1   5170556 107150149       624

Step:  AIC= 608.66 
 Preis ~ WQ 

       Df Sum of Sq      RSS      AIC
+ Lage  1   4736613 70364702      608
<none>              75101315      609
+ WFL   1   1863992 73237323      610
+ VD    1    555800 74545515      610
+ RW1   1    462284 74639030      610

Step:  AIC= 607.92 
 Preis ~ WQ + Lage 

          Df Sum of Sq      RSS      AIC
+ WFL      1   4721973 65642729      607
<none>                 70364702      608
+ WQ:Lage  1   2829768 67534934      608
+ RW1      1   2567408 67797294      608
+ VD       1    678458 69686244      610

Step:  AIC= 607.01 
 Preis ~ WQ + Lage + WFL 

          Df Sum of Sq      RSS      AIC
+ WQ:Lage  1   5610596 60032132      605
+ RW1      1   3404796 62237933      607
<none>                 65642729      607
+ VD       1    925528 64717201      608

Step:  AIC= 605.25 
 Preis ~ WQ + Lage + WFL + WQ:Lage 

       Df Sum of Sq      RSS      AIC
+ RW1   1   3492210 56539923      605
<none>              60032132      605
+ VD    1    355353 59676779      607

Step:  AIC= 604.74 
 Preis ~ WQ + Lage + WFL + RW1 + WQ:Lage 

       Df Sum of Sq      RSS      AIC
<none>              56539923      605
+ VD    1     94023 56445900      607


Backward fit:
> stepAIC(w6)
Start:  AIC= 596.53 
 Preis ~ RW1 + WFL + WQ + VD + Lage + Lage * WFL + Lage * WQ +  
    Lage * VD 

           Df Sum of Sq      RSS      AIC
- WQ:Lage   1    190953 40507327      595
- RW1       1    865788 41182162      595
<none>                  40316374      597
- WFL:Lage  1   6491181 46807556      601
- VD:Lage   1  12307855 52624230      606

Step:  AIC= 594.73 
 Preis ~ RW1 + WFL + WQ + VD + Lage + WFL:Lage + VD:Lage 

           Df Sum of Sq      RSS      AIC
- RW1       1    756790 41264117      594
- WQ        1   1910020 42417348      595
<none>                  40507327      595
- WFL:Lage  1  10302360 50809687      602
- VD:Lage   1  13222644 53729971      605

Step:  AIC= 593.51 
 Preis ~ WFL + WQ + VD + Lage + WFL:Lage + VD:Lage 

           Df Sum of Sq      RSS      AIC
- WQ        1   1793962 43058080      593
<none>                  41264117      594
- WFL:Lage  1  12069383 53333500      602
- VD:Lage   1  13657842 54921959      604

Step:  AIC= 593.3 
 Preis ~ WFL + VD + Lage + WFL:Lage + VD:Lage 

           Df Sum of Sq      RSS      AIC
<none>                  43058080      593
- WFL:Lage  1  14241342 57299422      603
- VD:Lage   1  19078878 62136957      607

Call:
lm(formula = Preis ~ WFL + VD + Lage + WFL:Lage + VD:Lage, data = wohnung)

Coefficients:
(Intercept)          WFL           VD        Lage2    WFL:Lage2
VD:Lage2  
  -53269.15        55.92      8025.62     59259.63       -46.71
-8233.36  



***********************************************************************
Christian Hennig
Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg
hennig at math.uni-hamburg.de, http://www.math.uni-hamburg.de/home/hennig/
#######################################################################
ich empfehle www.boag-online.de




More information about the R-help mailing list