[R] model simplification using Crawley as a guide

Ben Bolker bolker at ufl.edu
Wed Jun 11 23:11:27 CEST 2008

Lucke, Joseph F <Joseph.F.Lucke <at> uth.tmc.edu> writes:

> And to follow FH and HW
> What level of significance are you using? .05 is excessively liberal.
> Are you adjusting your p-values for the number of possible models? Do
> you realize the p-values for dropping a term, being selected as the
> maximum of a set of p-values, do not follow their usual distributions?
> How are you compensating for sample size, as a p-value's being
> significant is a function of sample size?  How are you compensating for
> the fact that the current model choice is dependent on the previous
> model choices? How do you know your tree of model choices is the optimal
> one?  Have you considered cross-validation?  Are you looking for a model
> that true describes a phenomenon or a predictive model that can be used
> for practical purposes?

   Ouch.  While Frank Harrell and Joseph Lucke are raising
serious issues about model selection, maybe we could keep in mind that
we don't want to scare off all the students who ever try to use R
to figure out basic statistics.  I would follow Peter Dalgaard's advice
(about "drop1") and Hadley Wickham's (about graphical diagnostics), 
and if possible bring up the other issues about
model selection with others around you -- if you're a student, ask
your prof. or someone in the stats department.  It can be tough
to try to do things right if those around you are still
doing them wrong ...  If you tell us what field you're in we
may be able to point you to more subject-specific references
(e.g. Whittingham, Mark J., Philip A. Stephens, Richard B. Bradbury, and Robert
P. Freckleton. 2006. Why do we still use stepwise modelling in ecology and
behaviour? Journal of Animal Ecology 75, no. 5: 1182-1189)

   Ben Bolker

More information about the R-help mailing list