[R] linear models and colinear variables...

Peter Gaffney petertgaffney at yahoo.com
Thu Jul 1 01:32:59 CEST 2004


Hi!

I'm having some issues on both conceptual and
technical levels for selecting the right combination
of variables for this model I'm working on. The basic,
all inclusive form looks like

lm(mic ~ B * D * S * U * V * ICU)

Where mic, U, V, and ICU are numeric values and B D
and S are factors with about 16, 16 and 2 levels
respectively. In short, there's a ton of actual
explanatory variables that look something like this:

Bstaph.aureus:Dvan:Sr:U:ICU

There are a good number of hits but there's also a
staggering number of complete misses, due to a
combination of scare data in that particular niche and
actual lack of deviation from the categorical mean. 
My suspicion is that there's a large degree of
colinearity in some of these variables that serves to
reduce the total effect of either of a nearly colinear
pair to an insignificant level; my hope is that
removing one of a mostly colinear group would allow
the other variables' possibly significant effects to
be measured.

Question 1) Is this legitimate at all? Can I do
regression using the entire data set over only
selected factors while ignoring others?
(Admittedly I only just got my Bachelor's in math; the
gaps in my knowlege here are profound and
aggravating.)

Question 2) How do I go about selecting possible
colinear explanatory variables?
I had originally thought I'd just make a matrix of
coefficients of colinearity for each pair of variables
and iteratively re-run the model until I got the
results I wanted, but I can't really figure out how to
do this.  In addition, I'm not sure how to do this in
the model syntax once I've actually decided on some
variables to exclude.
For instance, supposing I wanted to run the model as
above without the variable
Bstaph.aureus:Dvan:Sr:U:ICU.  What I tried was

lm(mic ~ B * D * S * U * V * ICU -
Bstaph.aureus:Dvan:Sr:U:ICU).

Obviously this doesn't work because the variable name
Bstaph.aureus:Dvan:Sr:U:ICU hasn't been recognized
yet.  How do I do this?  My best guess so far is to
build and define each of the variables like
Bstaph.aureus:Dvan:Sr:U:ICU by hand with some
imperative/iterative style programming using some kind
of string generation system.  This sounds like a royal
pain, and is something I'd rather avoid doing if at
all possible.

Any suggestions? :-D

-petertgaffney




More information about the R-help mailing list