[R] model simplification using Crawley as a guide

Frank E Harrell Jr f.harrell at vanderbilt.edu
Wed Jun 11 13:42:10 CEST 2008

ChCh wrote:
> Hello,
> I have consciously avoided using step() for model simplification in favour
> of manually updating the model by removing non-significant terms one at a
> time.  I'm using The R Book by M.J. Crawley as a guide. It comes as no
> surprise that my analysis does proceed as smoothly as does Crawley's and
> being a beginner, I'm struggling with what to do next.  
> I have a model:
> lm(y~A * B * C)
> where A is a categorical variable with three levels and B and C are
> continuous covariates.
> Following Crawley, I execute the model, then use summary.aov() to identify
> non-significant terms.  I begin deleting non-significant interaction terms
> one at a time (using update).  After each update() statement, I use
> anova(modelOld,modelNew) to contrast the previous model with the updated
> one.  After removing all the interaction terms, I'm left with:
> lm(y~ A + B + C)
> again, using summary.aov() I identify A to be non-significant, so I remove
> it, leaving:
> lm(y~B + C) both of which are continuous variables
> Does it still make sense to use summary.aov() or should I use summary.lm()
> instead?  Has the analysis switched from an ANCOVA to a regression?  Both
> give different results so I'm uncertain which summary to accept.
> Any help would be appreciated!

What is the theoretical basis for removing insignificant terms?  How 
will you compensate for this in the final analysis (e.g., how do you 
unbias your estimate of sigma squared)?

Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

More information about the R-help mailing list