[R] Stepwise Regression and PLS
jinsong_zh at yahoo.com
Mon Feb 2 04:13:49 CET 2004
--- Frank E Harrell Jr <feh3k at spamcop.net> wrote:
> On Sun, 1 Feb 2004 11:09:28 -0800 (PST)
> Jinsong Zhao <jinsong_zh at yahoo.com> wrote:
> > Dear all,
> > I am a newcomer to R. I intend to using R to do
> > stepwise regression and PLS with a data set (a
> > matrix, with one dependent and 19 independent
> > variable). Based on the same data set, I have done
> > same work using SPSS and SAS. However, there is
> > difference between the results obtained by R and
> > or SAS.
> > In the case of stepwise, SPSS gave out a model
> with 4
> > independent variable, but with step(), R gave out
> > model with 10 and much higher R2. Furthermore,
> > regsubsets() also indicate the 10 variable is one
> > the best regression subset. How to explain this
> > difference? And in the case of my data set, how
> > variables that enter the model would be
> > In the case of PLS, the results of mvr function of
> > pls.pcr package is also different with that of
> > Although the number of optimum latent variables is
> > same, the difference between R2 is much large.
> > Any comment and suggestion is very appreciated.
> > in advance!
> > Best wishes,
> > Jinsong Zhao
> In your case SPSS, SAS, R, S-Plus, Stata, Systat,
> Statistica, and every
> other package will agree in one sense, because
> results from all of them
> will be virtually meaningless. Simulate some data
> from a known model and
> you'll quickly find out why stepwise variable
> selection is often a train
> Frank E Harrell Jr Professor and Chair
> School of Medicine
> Department of Biostatistics
> Vanderbilt University
For the case of stepwise regression, I have found that
the subsets I got using regsubsets() are collinear.
However, the variables in SPSS's result are not
collinear. I wonder what I should do to get a same or
better linear model.
More information about the R-help