# [Rd] step.lm() fails to drop {many empty 2-way factor cells} (PR#3491)

maechler at stat.math.ethz.ch maechler at stat.math.ethz.ch
Wed Jul 16 13:02:41 MEST 2003

```Exec. Summary:
step() basically ``fails'' whereas MASS' stepAIC() does work

This may not be a bug in the strictest sense, but at least
something for the wish list.  Unfortunately I have no time
currently to investigate further myself but want to be sure this
won't be forgotten:

The example is using a real data set with 216 observations on 9
variables -- where we have anonymized variable and factor level names.
The 8 explanatory variables are 4 factors (with 3 / 2 / 4 / 4 levels)
and 4 continuous covariates.  The factor combinations observed
are very far from balanced, and have actually quite a few 0 cells.

The following is a reproducible R script
(you need internet connection and our FTP server working) :

## step.lm()  "fails"  when 2-way factor interactions have (many) empty cells
##             whereas stepAIC() works fine

Dat\$f3 <- ordered(Dat\$f3)

## Full Model :  y ~ (f1+f2+f3+f4 + C1+C2+C3+C4)
## ----------   where f1..f4 are FACTORS  and C1..C4 are continuous covariates

## "Look at data"
str(Dat)
summary(Dat)

## several 2-way interactions have empty cells, e.g.
with(Dat, table(f1,f4)) #!
with(Dat, table(f1,f3))

## see the full design:
with(Dat, ftable(table(f1,f2,f3,f4))) ##--> many many 0
pairs(Dat, col = 1+as.integer(Dat\$f1), pch = 1+as.integer(Dat\$sex))

## Fit a "full model" with all 2-way interactions :
lmAll <- lm(formula = y ~ .^2, data = Dat)
summary(lmAll) # many insignificant ones {BTW: why is it bad to have "*" here?}

drop1(lmAll)
## clearly shows interaction terms that should be dropped (would decrease AIC)

## BUT:
sr <- step(lmAll)
## nothing eliminated  ?step claims it would work when drop1() does

library(MASS)
sAr <- stepAIC(lmAll) ## <<< stepAIC() *does* work
## but with instructive
## There were 19 warnings (use warnings() to see them)
warnings()
##- Warning messages:
##- 1: 0 df terms are changing AIC in: stepAIC(lmAll)
##- 2: 0 df terms are changing AIC in: stepAIC(lmAll)
##- ...
##- ...
summary(sAr)

```