[R] Stepwise Regression and PLS

Liaw, Andy andy_liaw at merck.com
Mon Feb 2 15:46:03 CET 2004

Just a few more comments to what Chris said:

Collinearity usually arise in two situations:
1. Insufficient sample; i.e., data points that make the variables _not_ as
collinear are not included in the sample.
2. The variables are `naturally' correlated.

If it's the first, then #2 from the list Chris cited is an possible option.
Otherwise, I'd say shrinkage makes more sense than regressing on principal
components.  Both are in the same class of biased estimators, but one needs
to be lucky to have the first few PCs correlate well to the response in case
of PCR.  In any case, interpretation of model coefficients from such data
will likely be difficult.

Just my $0.02...


> From: Chris Lawrence
Peter Kennedy, in "A Guide to Econometrics" (pp. 187-89) suggests the 
following options for dealing with collinearity:

1. "Do nothing."  The main problem in OLS when variables are collinear 
is that the estimated variances of the parameters are often inflated.
2. Obtain more data.
3. Formalize relationships among regressors (for example, in a 
simultaneous equation model).
4. Specify a relationship among the *parameters*.
5. Drop one or more variables.  (In essence, a subset of #4 where 
coefficients are set to zero.)
6. Incorporate estimates from other studies.  (A Bayesian might consider 
using a strong prior.)
7. Form a principal component from the variables, and use that instead.
8. Shrink the OLS estimates using the ridge or Stein estimators.

Notice:  This e-mail message, together with any attachments,...{{dropped}}

More information about the R-help mailing list