# [R] linear models and colinear variables...

Jonathan Baron baron at psych.upenn.edu
Thu Jul 1 01:47:17 CEST 2004

```On 06/30/04 16:32, Peter Gaffney wrote:
>Hi!
>
>I'm having some issues on both conceptual and
>technical levels for selecting the right combination
>of variables for this model I'm working on. The basic,
>all inclusive form looks like
>
>lm(mic ~ B * D * S * U * V * ICU)

When you do this, you are including all the interaction terms.
The * indicates an interaction, as opposed to +.  That might make
sense unders some circumstances, for example if you are just
higher-order interactions that are not significant, but usually
it does more to obscure the interesting effects than to display
them.

>My suspicion is that there's a large degree of
>colinearity in some of these variables that serves to
>reduce the total effect of either of a nearly colinear
>pair to an insignificant level; my hope is that
>removing one of a mostly colinear group would allow
>the other variables' possibly significant effects to
>be measured.

There may be colinearity, but the most likely problem is that you
are including too many interactions, at too high a level.
Inclusion of nonsignificant interaction terms often turns
significant main effects into nonsignificant effects.

>Question 1) Is this legitimate at all? Can I do
>regression using the entire data set over only
>selected factors while ignoring others?
>(Admittedly I only just got my Bachelor's in math; the
>gaps in my knowlege here are profound and
>aggravating.)

If you select predictors on the basis of which ones are
significant, then the final significance levels don't mean much,
usually.  Remember, 1 out of 20 will be significant at .05 even
if you are using random numbers.

>Question 2) How do I go about selecting possible
>colinear explanatory variables?

If there is colinearity, then what to do about it depends on the
substance of the questions you are asking.  Some options are to
combine variables, do some sort of factor analysis and use
factors rather than variables as predictors, use the most
meaningful of the variables that are colinear, or just live with
it, if the substantive issues rule out the other options.  (I'm
sure there are other solutions that others might point out.)

>I had originally thought I'd just make a matrix of
>coefficients of colinearity for each pair of variables
>and iteratively re-run the model until I got the
>results I wanted, but I can't really figure out how to
>do this.  In addition, I'm not sure how to do this in
>the model syntax once I've actually decided on some
>variables to exclude.
>For instance, supposing I wanted to run the model as
>above without the variable
>Bstaph.aureus:Dvan:Sr:U:ICU.  What I tried was
>
>lm(mic ~ B * D * S * U * V * ICU -
>Bstaph.aureus:Dvan:Sr:U:ICU).
>
>Obviously this doesn't work because the variable name
>Bstaph.aureus:Dvan:Sr:U:ICU hasn't been recognized
>yet.  How do I do this?  My best guess so far is to

Not clear what you mean here.

>build and define each of the variables like
>Bstaph.aureus:Dvan:Sr:U:ICU by hand with some
>imperative/iterative style programming using some kind
>of string generation system.  This sounds like a royal
>pain, and is something I'd rather avoid doing if at
>all possible.
>
>Any suggestions? :-D
>
>-petertgaffney

Jon
--
Jonathan Baron, Professor of Psychology, University of Pennsylvania