[R] trouble with step() and stepAIC() selecting the best model

natalia norden norden at cict.fr
Fri May 5 11:14:18 CEST 2006


Hello,
I have some trouble using step() and stepAIC() functions.
I'm predicting recruitment against several factors for different plant 
species using a negative binomial glm.

Sometimes, summary(step(model)) or summary(stepAIC(model) does not 
select the best model (lowest AIC) but just stops before.
For some species, step() works and stepAIC don't and in others, it's the 
opposite.

The model is

  model=glm.nb(sdg.cens1 ~ seed.cens1 + log(in.lit+1) + 
log(trans.dir+1)+ log(trans.dif+1) + slope +log(pH+1) + log(CN+1) + site,
         data=data)


For example, for species A using stepAIC() I get:

Call:
glm.nb(formula = tot.rec ~ slope + log(pH + 1) + log(CN + 1),
     data = data, init.theta = 0.0575099635396228, link = log)

Deviance Residuals:
        Min          1Q      Median          3Q         Max
-7.591e-01  -3.688e-01  -1.828e-01  -8.494e-08   1.520e+00

Coefficients:
               Estimate Std. Error   z value Pr(>|z|)
(Intercept)    110.729     47.643     2.324  0.02012 *
slope2         -34.202 972079.452 -3.52e-05  0.99997
slope3          -1.423      1.371    -1.038  0.29928
log(pH + 1)    -51.244     19.544    -2.622  0.00874 **
log(CN + 1)     -9.132      6.906    -1.322  0.18602
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for Negative Binomial(0.0575) family taken to be 1)

     Null deviance: 34.068  on 156  degrees of freedom
Residual deviance: 19.602  on 152  degrees of freedom
AIC: 90.357


But using step() I get:

Step:  AIC= 138.4
  tot.rec ~ log(mean.lit.rec + 1) + log(trans.dir + 1) + log(trans.dif +
     1) + slope + log(pH + 1) + log(CN + 1) + site


Call:
glm.nb(formula = tot.rec ~ log(mean.lit.rec + 1) + log(trans.dir +
     1) + log(trans.dif + 1) + slope + log(pH + 1) + log(CN +
     1) + site, data = data, init.theta = 74070.0501196965, link = log)

Deviance Residuals:
     Min       1Q   Median       3Q      Max
-2.3110  -0.5450  -0.3179   0.0000   3.8755

Coefficients:
                         Estimate Std. Error   z value Pr(>|z|)
(Intercept)            6.809e+01  1.905e+01     3.574 0.000351 ***
log(mean.lit.rec + 1) -1.238e+00  1.083e+00    -1.143 0.252987
log(trans.dir + 1)     2.069e-01  1.227e+00     0.169 0.866120
log(trans.dif + 1)    -6.978e-01  1.578e+00    -0.442 0.658327
slope2                -3.730e+01  1.022e+07 -3.65e-06 0.999997
slope3                -9.195e-01  6.269e-01    -1.467 0.142488
log(pH + 1)           -3.152e+01  6.918e+00    -4.556 5.22e-06 ***
log(CN + 1)           -5.053e+00  3.404e+00    -1.485 0.137643
site1                  1.379e-01  4.618e-01     0.299 0.765278
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for Negative Binomial(91071.54) family taken to be 1)

     Null deviance: 145.586  on 156  degrees of freedom
Residual deviance:  99.166  on 148  degrees of freedom
AIC: 140.40



However, for species B I get the opposite, it works with step() but not 
with stepAIC().

Using step() AIC =414.28 and using stepAIC(), AIC=424.15 and the results 
are totally different.

I would appreciate any help!!
Thank you

Natalia Norden




More information about the R-help mailing list