[R] Stepwise Regression and PLS

Frank E Harrell Jr feh3k at spamcop.net
Sun Feb 1 20:31:34 CET 2004


On Sun, 1 Feb 2004 11:09:28 -0800 (PST)
Jinsong Zhao <jinsong_zh at yahoo.com> wrote:

> Dear all,
> 
> I am a newcomer to R. I intend to using R to do
> stepwise regression and PLS with a data set (a 55x20
> matrix, with one dependent and 19 independent
> variable). Based on the same data set, I have done the
> same work using SPSS and SAS. However, there is much
> difference between the results obtained by R and SPSS
> or SAS.
> 
> In the case of stepwise, SPSS gave out a model with 4
> independent variable, but with step(), R gave out a
> model with 10 and much higher R2. Furthermore,
> regsubsets() also indicate the 10 variable is one of
> the best regression subset. How to explain this
> difference? And in the case of my data set, how many
> variables that enter the model would be reasonable?
> 
> In the case of PLS, the results of mvr function of
> pls.pcr package is also different with that of SAS.
> Although the number of optimum latent variables is
> same, the difference between R2 is much large. Why?
> 
> Any comment and suggestion is very appreciated. Thanks
> in advance!
> 
> Best wishes,
> 
> Jinsong Zhao
> 

In your case SPSS, SAS, R, S-Plus, Stata, Systat, Statistica, and every
other package will agree in one sense, because results from all of them
will be virtually meaningless.  Simulate some data from a known model and
you'll quickly find out why stepwise variable selection is often a train
wreck.

---
Frank E Harrell Jr   Professor and Chair           School of Medicine
                     Department of Biostatistics   Vanderbilt University




More information about the R-help mailing list