[R] Stepwise regression

Marc Schwartz marc_schwartz at comcast.net
Thu Dec 14 18:02:47 CET 2006


On Thu, 2006-12-14 at 14:37 +0000, Timothy.Mak at iop.kcl.ac.uk wrote:
> Dear all, 
> 
> I am wondering why the step() procedure in R has the description 'Select a 
> formula-based model by AIC'. 
> 
> I have been using Stata and SPSS and neither package made any reference to 
> AIC in its stepwise procedure, and I read from an earlier R-Help post that 
> step() is really the 'usual' way for doing stepwise (R Help post from Prof 
> Ripley, Fri, 2 Apr 1999 05:06:03 +0100 (BST)). 
> 
> My understanding of the 'usual' way of doing say forward regression is 
> that variables whose p value drops below a criterion (commonly 0.05) 
> become candidates for being included in the model, and the one with the 
> lowest p among these gets chosen, and the step is repeated until all p 
> values not in the model are above 0.05, cf Hosmer and Lemeshow (1989) 
> Applied Logistic Regression. The procedure does not require examination of 
> the AIC. 
> 
> I am not well aquainted with R enough to understand the codes used in 
> step(), so can somebody tell me how step() works?
> 
> Thanks very much, 
> 
> Tim

> library(fortunes)

> fortune("stepwise")

Frank Harrell: Here is an easy approach that will yield results only
slightly less valid than one actually using the response variable:
  x <- data.frame(x1, x2, x3, x4, ..., other potential predictors)
  x[ , sample(ncol(x))]
Andy Liaw: Hmm... Shouldn't that be something like:
  x[, sample(ncol(x), ceiling(ncol(x) * runif(1)))]
   -- Frank Harrell and Andy Liaw (about alternative strategies for
      stepwise regression and `random parsimony')
      R-help (May 2005)


But seriously, using:

  RSiteSearch("stepwise")

will provide links to prior discussions on why the use of stepwise based
model building is to be avoided.

A copy of Frank's book (more info here):

  http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/RmS

will also provide insight.


HTH,

Marc Schwartz



More information about the R-help mailing list