[R] Regression query

Liaw, Andy andy_liaw at merck.com
Sun Jun 13 21:44:20 CEST 2004


> From: Peter Flom
> 
> If variables are colinear, then looking at interactions among them
> doesn't make much sense.  High collinearity means that one variable is
> nearly a linear combination of others.  IOW, that variable is 
> not adding
> much information.  So, if you look at the interaction, you are ALMOST
> looking at a quadratic (e.g., if the collinearity involves only 2
> variables, then one is very similar to the other, so X1*X2 is almost
> X1*X1).  The output will be confusing, to say the least. 
> 
> Worse, when you include collinear variables, the resulting equation is
> highly sensitive to small (sometimes very small) changes in the data. 
> Belsley gives an example where changes in the third decimal 
> place result
> in totally different equations.
> 
> For details see Belsley's book titled something like "collinearity and
> weak data in regression" (sorry, the book and my files are at the
> office, but this should let you find it

I guess you're referring to: "Conditioning Diagnostics: Collinearity and
Weak Data in Regression" (Wiley, 1992, rather pricey...).

Hocking has a plot that shows the effect of collinearity in a paper from the
early '80s (the "picket fence").  The plot is used on the cover of his
latest linear model book, also published by Wiley, now in 2nd edition.  

[An exercise for R newbies:  Try reproducing that plot in R, probably using
the Scaterplot3D package.]

Best,
Andy

 
> HTH
> 
> Peter L. Flom, PhD
> Assistant Director, Statistics and Data Analysis Core
> Center for Drug Use and HIV Research
> National Development and Research Institutes
> 71 W. 23rd St
> www.peterflom.com
> New York, NY 10010
> (212) 845-4485 (voice)
> (917) 438-0894 (fax)
> 
> 
> >>> "Devshruti Pahuja" <devshruti at hotmail.com> 06/11/04 5:35 AM >>>
> Hi
> 
> I have a set of data with both quantitative and categorical 
> predictors.
> After scaling of response variable, i looked for 
> multicollinearity (VIF
> values) among the predictors and removed the predictors who 
> were hinding
> some of the
> other significant predictors. I'm curious to know whether the 
> predictors
> (who are not significant) while doing simple 'lm' will be involved in
> interactions. How do i take into
> account  interactions of those predictors whom i removed just on the
> basis
> of  multicollinearity ?
> 
>  I'll appreciate if someone can throw some light on this 
> matter and how
> to
> use R to detect the interactions effectively .
> 
> Thanks
> 
>  Regards
>  Dev
> 
> > ------Final 'lm model'--------------------
> > > logmodelfull_minus_run_hr_walk_batting <- lm(log(salary) 
> ~ hit+rbi +
> walk
> > + obp +
> strike.out+free.agent.eligible+free.agent.1991+arbitr.elgible.)
> > > summary(logmodelfull_minus_run_hr_walk_batting)
> >
> > Call:
> > lm(formula = log(salary) ~ hit + rbi + walk + obp + strike.out +
> >     free.agent.eligible + free.agent.1991 + arbitr.elgible.)
> >
> > Residuals:
> >      Min       1Q   Median       3Q      Max
> > -2.41786 -0.28911 -0.02814  0.31890  1.49007
> >
> > Coefficients:
> >                       Estimate Std. Error t value Pr(>|t|)
> > (Intercept)           5.340782   0.251218  21.260  < 2e-16 ***
> > hit                   0.004479   0.001158   3.867 0.000133 ***
> > rbi                   0.011102   0.002195   5.059 7.05e-07 ***
> > walk                  0.005421   0.002206   2.457 0.014533 *
> > obp                  -1.385584   0.824105  -1.681 0.093653 .
> > strike.out           -0.005399   0.001438  -3.755 0.000205 ***
> > free.agent.eligible1  1.611521   0.080657  19.980  < 2e-16 ***
> > free.agent.19911     -0.301243   0.103481  -2.911 0.003848 **
> > arbitr.elgible.1      1.293059   0.086696  14.915  < 2e-16 ***
> > ---
> > Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
> >
> > Residual standard error: 0.5351 on 328 degrees of freedom
> > Multiple R-Squared: 0.7981,     Adjusted R-squared: 0.7932
> > F-statistic: 162.1 on 8 and 328 DF,  p-value: < 2.2e-16
> >
> >
> --------------------------------------------------------------
> ------------
> --
> > ----------------------------------------------------
> >
> >
> > --------------with
> >
> interactions--------------------------------------------------
> --------------
> > ---------------------------
> >
> > >
> > > summary(baseball.lgmodel_with_interactions_ALL_arbid)
> >
> > Call:
> > lm(formula = log(salary) ~ hit + rbi + strike.out +
> free.agent.eligible +
> >     free.agent.1991 + arbitr.elgible. + hit * free.agent.1991 +
> >     hit * arbitr.elgible. + hit * rbi + rbi * free.agent.eligible +
> >     rbi * arbitr.elgible. + rbi * arbitr.1991 + hit * strike.out +
> >     strike.out * free.agent.eligible + strike.out * 
> arbitr.elgible. +
> >     strike.out * run + strike.out * hr + hit * free.agent.eligible +
> >     free.agent.eligible * run + hit * free.agent.1991 + strike.out *
> >     free.agent.1991 + free.agent.1991 * batting + free.agent.1991 *
> >     obp + arbitr.elgible. * run + batting * double + obp * run +
> >     obp * hr + walk * stolen.base + hit * arbitr.1991 +
> free.agent.eligible
> > *
> >     double + arbitr.elgible. * double + strike.out * triple +
> >     triple * batting + triple * walk + triple * walk + hit *
> >     hr + rbi * hr + free.agent.eligible * hr + free.agent.1991 *
> >     hr + arbitr.elgible. * hr + hr * arbitr.1991 + hit * walk +
> >     free.agent.eligible * walk + walk * rbi + rbi * stolen.base +
> >     strike.out * stolen.base + stolen.base * batting + stolen.base *
> >     walk + stolen.base * rbi + stolen.base * walk + 
> arbitr.elgible. *
> >     error)
> >
> > Residuals:
> >      Min       1Q   Median       3Q      Max
> > -2.29352 -0.28287 -0.03748  0.29790  1.31590
> >
> > Coefficients:
> >                                   Estimate Std. Error t 
> value Pr(>|t|)
> > (Intercept)                      5.217e+00  3.467e-01  
> 15.048  < 2e-16
> ***
> > hit                              6.927e-03  6.226e-03   
> 1.112 0.266889
> > rbi                              1.908e-02  1.150e-02   
> 1.658 0.098350
> .
> > strike.out                      -5.692e-03  4.586e-03  
> -1.241 0.215517
> > free.agent.eligible1             1.287e+00  2.259e-01   
> 5.699 3.05e-08
> ***
> > free.agent.19911                 3.828e-01  6.575e-01   
> 0.582 0.560914
> > arbitr.elgible.1                 1.038e+00  2.195e-01   
> 4.726 3.63e-06
> ***
> > arbitr.19911                    -1.024e+00  4.392e-01  
> -2.331 0.020443
> *
> > run                              4.932e-02  2.905e-02   
> 1.698 0.090682
> .
> > hr                              -1.093e-01  7.208e-02  
> -1.516 0.130543
> > batting                         -1.814e-01  2.558e+00  
> -0.071 0.943522
> > obp                             -1.375e+00  2.253e+00  
> -0.610 0.542099
> > double                          -5.259e-02  4.489e-02  
> -1.172 0.242349
> > walk                             1.395e-02  9.757e-03   
> 1.430 0.153808
> > stolen.base                     -1.685e-02  4.299e-02  
> -0.392 0.695372
> > triple                          -1.367e-01  1.600e-01  
> -0.854 0.393807
> > error                           -4.097e-03  6.879e-03  
> -0.595 0.552007
> > hit:free.agent.19911             8.248e-04  4.611e-03   
> 0.179 0.858174
> > hit:arbitr.elgible.1             4.873e-03  6.448e-03   
> 0.756 0.450395
> > hit:rbi                         -1.382e-04  7.709e-05  
> -1.792 0.074184
> .
> > rbi:free.agent.eligible1         5.352e-03  9.555e-03   
> 0.560 0.575855
> > rbi:arbitr.elgible.1            -3.384e-03  1.136e-02  
> -0.298 0.766072
> > rbi:arbitr.19911                 3.596e-02  2.179e-02   
> 1.650 0.100046
> > hit:strike.out                   5.480e-06  5.446e-05   
> 0.101 0.919917
> > strike.out:free.agent.eligible1 -2.570e-03  4.282e-03  
> -0.600 0.548890
> > strike.out:arbitr.elgible.1     -9.703e-04  5.234e-03  
> -0.185 0.853068
> > strike.out:run                   1.685e-04  1.246e-04   
> 1.352 0.177345
> > strike.out:hr                   -3.088e-04  2.277e-04  
> -1.356 0.176229
> > hit:free.agent.eligible1        -1.359e-03  6.224e-03  
> -0.218 0.827363
> > free.agent.eligible1:run         1.248e-02  9.109e-03   
> 1.370 0.171917
> > strike.out:free.agent.19911     -1.851e-02  5.974e-03  
> -3.099 0.002140
> **
> > free.agent.19911:batting         7.076e-01  6.200e+00   
> 0.114 0.909215
> > free.agent.19911:obp            -1.421e+00  3.952e+00  
> -0.360 0.719394
> > arbitr.elgible.1:run            -8.541e-03  8.773e-03  
> -0.974 0.331100
> > batting:double                   2.346e-01  1.609e-01   
> 1.458 0.145884
> > run:obp                         -1.825e-01  7.492e-02  
> -2.436 0.015462
> *
> > hr:obp                           3.687e-01  2.116e-01   
> 1.742 0.082608
> .
> > walk:stolen.base                -6.789e-05  1.557e-04  
> -0.436 0.663083
> > hit:arbitr.19911                -5.835e-03  7.084e-03  
> -0.824 0.410808
> > free.agent.eligible1:double     -1.151e-02  1.663e-02  
> -0.692 0.489334
> > arbitr.elgible.1:double          2.169e-03  1.938e-02   
> 0.112 0.910985
> > strike.out:triple               -8.106e-04  6.023e-04  
> -1.346 0.179475
> > batting:triple                   5.179e-01  5.599e-01   
> 0.925 0.355841
> > walk:triple                      8.755e-04  9.262e-04   
> 0.945 0.345349
> > hit:hr                          -3.320e-04  2.626e-04  
> -1.264 0.207180
> > rbi:hr                           4.748e-04  3.015e-04   
> 1.575 0.116414
> > free.agent.eligible1:hr          1.840e-02  2.313e-02   
> 0.796 0.426972
> > free.agent.19911:hr              7.216e-02  1.889e-02   
> 3.819 0.000165
> ***
> > arbitr.elgible.1:hr              4.111e-02  2.803e-02   
> 1.467 0.143564
> > arbitr.19911:hr                 -2.368e-02  4.647e-02  
> -0.510 0.610723
> > hit:walk                         3.173e-05  7.826e-05   
> 0.405 0.685442
> > free.agent.eligible1:walk       -5.423e-03  4.984e-03  
> -1.088 0.277472
> > rbi:walk                        -7.569e-05  1.313e-04  
> -0.577 0.564598
> > rbi:stolen.base                  3.980e-05  1.605e-04   
> 0.248 0.804409
> > strike.out:stolen.base          -2.611e-04  1.615e-04  
> -1.617 0.107004
> > batting:stolen.base              1.552e-01  1.434e-01   
> 1.082 0.280020
> > arbitr.elgible.1:error           3.930e-03  1.390e-02   
> 0.283 0.777495
> > ---
> > Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
> >
> > Residual standard error: 0.4925 on 280 degrees of freedom
> > Multiple R-Squared: 0.854,      Adjusted R-squared: 0.8248
> > F-statistic: 29.24 on 56 and 280 DF,  p-value: < 2.2e-16
> >
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
>




More information about the R-help mailing list