[R] Stepwise regression

Greg Snow Greg.Snow at intermountainmail.org
Thu Dec 14 18:53:02 CET 2006


You may want to look at a book that was published more recently than 17
years ago (computing has changed a lot since then).  Doing stepwise
regression using p-values is one approach (and when p-values were the
easiest (only) thing to compute, it was reasonable to use them).  But
think about how many p-values you would be computing and comparing to
0.05 in a stepwise regression, now think about how many you would have
computed if your data had come from a different sample, what is your
type I error rate?  Is the usual p-value theory even meaningful in this
situation?

There are several criteria that can be used in stepwise regression to
decide which term to add/drop, p-value (or F-statistic) is only 1,
others include AIC, BIC, Adjusted R-squared, PRESS, gut feeling, prior
knowledge, cost, ...

 Some of these have properties better than p-values, but most still
suffer from the fact that a small change in the data can result in a
very different model.

Look at the lars, lasso2, and BMA packages for some more modern
alternatives to stepwise regression.

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at intermountainmail.org
(801) 408-8111
 

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of
Timothy.Mak at iop.kcl.ac.uk
Sent: Thursday, December 14, 2006 9:28 AM
To: r-help at stat.math.ethz.ch
Subject: [R] Stepwise regression

Dear all, 

I am wondering why the step() procedure in R has the description 'Select
a formula-based model by AIC'. 

I have been using Stata and SPSS and neither package made any reference
to AIC in its stepwise procedure, and I read from an earlier R-Help post
that
step() is really the 'usual' way for doing stepwise (R Help post from
Prof Ripley, Fri, 2 Apr 1999 05:06:03 +0100 (BST)). 

My understanding of the 'usual' way of doing say forward regression is
that variables whose p value drops below a criterion (commonly 0.05)
become candidates for being included in the model, and the one with the
lowest p among these gets chosen, and the step is repeated until all p
values not in the model are above 0.05, cf Hosmer and Lemeshow (1989)
Applied Logistic Regression. The procedure does not require examination
of the AIC. 

I am not well aquainted with R enough to understand the codes used in
step(), so can somebody tell me how step() works?

Thanks very much, 

Tim

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list