[R] model simplification using Crawley as a guide

Wed Jun 11 16:33:55 CEST 2008

On Wed, Jun 11, 2008 at 6:42 AM, Frank E Harrell Jr
<f.harrell at vanderbilt.edu> wrote:
> ChCh wrote:
>>
>> Hello,
>>
>> I have consciously avoided using step() for model simplification in favour
>> of manually updating the model by removing non-significant terms one at a
>> time.  I'm using The R Book by M.J. Crawley as a guide. It comes as no
>> surprise that my analysis does proceed as smoothly as does Crawley's and
>> being a beginner, I'm struggling with what to do next.
>> I have a model:
>>
>> lm(y~A * B * C)
>>
>> where A is a categorical variable with three levels and B and C are
>> continuous covariates.
>>
>> Following Crawley, I execute the model, then use summary.aov() to identify
>> non-significant terms.  I begin deleting non-significant interaction terms
>> one at a time (using update).  After each update() statement, I use
>> anova(modelOld,modelNew) to contrast the previous model with the updated
>> one.  After removing all the interaction terms, I'm left with:
>>
>> lm(y~ A + B + C)
>>
>> again, using summary.aov() I identify A to be non-significant, so I remove
>> it, leaving:
>>
>> lm(y~B + C) both of which are continuous variables
>>
>> Does it still make sense to use summary.aov() or should I use summary.lm()
>> instead?  Has the analysis switched from an ANCOVA to a regression?  Both
>> give different results so I'm uncertain which summary to accept.
>>
>> Any help would be appreciated!
>>
>>
>
> What is the theoretical basis for removing insignificant terms?  How will
> you compensate for this in the final analysis (e.g., how do you unbias your
> estimate of sigma squared)?

And in a similar vein, where are your exploratory graphics?  How do
you know that there is a linear relationship between your response and
your predictors?  Are the distributional assumptions you are making
appropriate?

Hadley

-- 
http://had.co.nz/