[R] dealing with multicollinearity

Manuel Gutierrez manuel_gutierrez_lopez at yahoo.es
Mon Apr 11 12:22:55 CEST 2005


I have a linear model y~x1+x2 of some data where the
coefficient for
x1 is higher than I would have expected from theory
(0.7 vs 0.88)
I wondered whether this would be an artifact due to x1
and x2 being correlated despite that the variance
inflation factor is not too high (1.065):
I used perturbation analysis to evaluate collinearity
library(perturb)
P<-perturb(A,pvars=c("x1","x2"),prange=c(1,1))
> summary(P)
Perturb variables:
x1 		 normal(0,1) 
x2 		 normal(0,1) 

Impact of perturbations on coefficients:
            mean     s.d.     min      max     
(Intercept)  -26.067    0.270  -27.235  -25.481
x1             0.726    0.025    0.672    0.882
x2             0.060    0.011    0.037    0.082

I get a mean for x1 of 0.726 which is closer to what
is expected.
I am not an statistical expert so I'd like to know if
my evaluation of the effects of collinearity is
correct and in that case any solutions to obtain a
reliable linear model.
Thanks,
Manuel

Some more detailed information:

> A<-lm(y~x1+x2)
> summary(A)

Call:
lm(formula = y ~ x1 + x2)

Residuals:
      Min        1Q    Median        3Q       Max 
-4.221946 -0.484055 -0.004762  0.397508  2.542769 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -27.23472    0.27996 -97.282  < 2e-16 ***
x1            0.88202    0.02475  35.639  < 2e-16 ***
x2            0.08180    0.01239   6.604 2.53e-10 ***
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.'
0.1 ` ' 1 

Residual standard error: 0.823 on 241 degrees of
freedom
Multiple R-Squared: 0.8411,	Adjusted R-squared: 0.8398

F-statistic: 637.8 on 2 and 241 DF,  p-value: <
2.2e-16 

> cor.test(x1,x2)

	Pearson's product-moment correlation

data:  x1 and x2 
t = -3.9924, df = 242, p-value = 8.678e-05
alternative hypothesis: true correlation is not equal
to 0 
95 percent confidence interval:
 -0.3628424 -0.1269618 
sample estimates:
      cor 
-0.248584




More information about the R-help mailing list