[R] Regression query : steps for model building

Liaw, Andy andy_liaw at merck.com
Sun Jun 13 03:37:24 CEST 2004


Without going through the copious output, I'll offer the following:

1. What to do will depend a lot on what you want to do with the model you
get in the end.  If you are going to make statistical inference based on the
model, you'll need to be a lot more careful on how you get to that model.

2. If you have some variables that are highly correlated
(multicollinearity), you won't have much of a chance in finding interaction
among them.  Exact collinearity is the same as confounding, which means you
can't even tell the main effects apart, let alone interaction.

I'd suggest you read Prof. Harrell's "Regression Modelling Strategies",
published by Springer.

Best,
Andy

> From: Devshruti Pahuja
> 
> Hi
> 
> I have a set of data with both quantitative and categorical 
> predictors.
> After scaling of response variable, i looked for 
> multicollinearity (VIF
> values) among the predictors and removed the predictors who 
> were hinding
> some of the
> other significant predictors. I'm curious to know whether the 
> predictors
> (who are not significant) while doing simple 'lm' will be involved in
> interactions. How do i take into
> account  interactions of those predictors whom i removed just 
> on the basis
> of  multicollinearity ?
> 
>  I'll appreciate if someone can throw some light on this 
> matter and how to
> use R to detect the interactions effectively .
> 
> Thanks
> 
>  Regards
>  Dev
> 
> > ------Final 'lm model'--------------------
> > > logmodelfull_minus_run_hr_walk_batting <- lm(log(salary) 
> ~ hit+rbi +
> walk
> > + obp + 
> strike.out+free.agent.eligible+free.agent.1991+arbitr.elgible.)
> > > summary(logmodelfull_minus_run_hr_walk_batting)
> >
> > Call:
> > lm(formula = log(salary) ~ hit + rbi + walk + obp + strike.out +
> >     free.agent.eligible + free.agent.1991 + arbitr.elgible.)
> >
> > Residuals:
> >      Min       1Q   Median       3Q      Max
> > -2.41786 -0.28911 -0.02814  0.31890  1.49007
> >
> > Coefficients:
> >                       Estimate Std. Error t value Pr(>|t|)
> > (Intercept)           5.340782   0.251218  21.260  < 2e-16 ***
> > hit                   0.004479   0.001158   3.867 0.000133 ***
> > rbi                   0.011102   0.002195   5.059 7.05e-07 ***
> > walk                  0.005421   0.002206   2.457 0.014533 *
> > obp                  -1.385584   0.824105  -1.681 0.093653 .
> > strike.out           -0.005399   0.001438  -3.755 0.000205 ***
> > free.agent.eligible1  1.611521   0.080657  19.980  < 2e-16 ***
> > free.agent.19911     -0.301243   0.103481  -2.911 0.003848 **
> > arbitr.elgible.1      1.293059   0.086696  14.915  < 2e-16 ***
> > ---
> > Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
> >
> > Residual standard error: 0.5351 on 328 degrees of freedom
> > Multiple R-Squared: 0.7981,     Adjusted R-squared: 0.7932
> > F-statistic: 162.1 on 8 and 328 DF,  p-value: < 2.2e-16
> >
> > 
> --------------------------------------------------------------
> ------------
> --
> > ----------------------------------------------------
> >
> >
> > --------------with
> >
> interactions--------------------------------------------------
> --------------
> > ---------------------------
> >
> > >
> > > summary(baseball.lgmodel_with_interactions_ALL_arbid)
> >
> > Call:
> > lm(formula = log(salary) ~ hit + rbi + strike.out + 
> free.agent.eligible +
> >     free.agent.1991 + arbitr.elgible. + hit * free.agent.1991 +
> >     hit * arbitr.elgible. + hit * rbi + rbi * free.agent.eligible +
> >     rbi * arbitr.elgible. + rbi * arbitr.1991 + hit * strike.out +
> >     strike.out * free.agent.eligible + strike.out * 
> arbitr.elgible. +
> >     strike.out * run + strike.out * hr + hit * free.agent.eligible +
> >     free.agent.eligible * run + hit * free.agent.1991 + strike.out *
> >     free.agent.1991 + free.agent.1991 * batting + free.agent.1991 *
> >     obp + arbitr.elgible. * run + batting * double + obp * run +
> >     obp * hr + walk * stolen.base + hit * arbitr.1991 +
> free.agent.eligible
> > *
> >     double + arbitr.elgible. * double + strike.out * triple +
> >     triple * batting + triple * walk + triple * walk + hit *
> >     hr + rbi * hr + free.agent.eligible * hr + free.agent.1991 *
> >     hr + arbitr.elgible. * hr + hr * arbitr.1991 + hit * walk +
> >     free.agent.eligible * walk + walk * rbi + rbi * stolen.base +
> >     strike.out * stolen.base + stolen.base * batting + stolen.base *
> >     walk + stolen.base * rbi + stolen.base * walk + 
> arbitr.elgible. *
> >     error)
> >
> > Residuals:
> >      Min       1Q   Median       3Q      Max
> > -2.29352 -0.28287 -0.03748  0.29790  1.31590
> >
> > Coefficients:
> >                                   Estimate Std. Error t 
> value Pr(>|t|)
> > (Intercept)                      5.217e+00  3.467e-01  
> 15.048  < 2e-16 ***
> > hit                              6.927e-03  6.226e-03   
> 1.112 0.266889
> > rbi                              1.908e-02  1.150e-02   
> 1.658 0.098350 .
> > strike.out                      -5.692e-03  4.586e-03  
> -1.241 0.215517
> > free.agent.eligible1             1.287e+00  2.259e-01   
> 5.699 3.05e-08 ***
> > free.agent.19911                 3.828e-01  6.575e-01   
> 0.582 0.560914
> > arbitr.elgible.1                 1.038e+00  2.195e-01   
> 4.726 3.63e-06 ***
> > arbitr.19911                    -1.024e+00  4.392e-01  
> -2.331 0.020443 *
> > run                              4.932e-02  2.905e-02   
> 1.698 0.090682 .
> > hr                              -1.093e-01  7.208e-02  
> -1.516 0.130543
> > batting                         -1.814e-01  2.558e+00  
> -0.071 0.943522
> > obp                             -1.375e+00  2.253e+00  
> -0.610 0.542099
> > double                          -5.259e-02  4.489e-02  
> -1.172 0.242349
> > walk                             1.395e-02  9.757e-03   
> 1.430 0.153808
> > stolen.base                     -1.685e-02  4.299e-02  
> -0.392 0.695372
> > triple                          -1.367e-01  1.600e-01  
> -0.854 0.393807
> > error                           -4.097e-03  6.879e-03  
> -0.595 0.552007
> > hit:free.agent.19911             8.248e-04  4.611e-03   
> 0.179 0.858174
> > hit:arbitr.elgible.1             4.873e-03  6.448e-03   
> 0.756 0.450395
> > hit:rbi                         -1.382e-04  7.709e-05  
> -1.792 0.074184 .
> > rbi:free.agent.eligible1         5.352e-03  9.555e-03   
> 0.560 0.575855
> > rbi:arbitr.elgible.1            -3.384e-03  1.136e-02  
> -0.298 0.766072
> > rbi:arbitr.19911                 3.596e-02  2.179e-02   
> 1.650 0.100046
> > hit:strike.out                   5.480e-06  5.446e-05   
> 0.101 0.919917
> > strike.out:free.agent.eligible1 -2.570e-03  4.282e-03  
> -0.600 0.548890
> > strike.out:arbitr.elgible.1     -9.703e-04  5.234e-03  
> -0.185 0.853068
> > strike.out:run                   1.685e-04  1.246e-04   
> 1.352 0.177345
> > strike.out:hr                   -3.088e-04  2.277e-04  
> -1.356 0.176229
> > hit:free.agent.eligible1        -1.359e-03  6.224e-03  
> -0.218 0.827363
> > free.agent.eligible1:run         1.248e-02  9.109e-03   
> 1.370 0.171917
> > strike.out:free.agent.19911     -1.851e-02  5.974e-03  
> -3.099 0.002140 **
> > free.agent.19911:batting         7.076e-01  6.200e+00   
> 0.114 0.909215
> > free.agent.19911:obp            -1.421e+00  3.952e+00  
> -0.360 0.719394
> > arbitr.elgible.1:run            -8.541e-03  8.773e-03  
> -0.974 0.331100
> > batting:double                   2.346e-01  1.609e-01   
> 1.458 0.145884
> > run:obp                         -1.825e-01  7.492e-02  
> -2.436 0.015462 *
> > hr:obp                           3.687e-01  2.116e-01   
> 1.742 0.082608 .
> > walk:stolen.base                -6.789e-05  1.557e-04  
> -0.436 0.663083
> > hit:arbitr.19911                -5.835e-03  7.084e-03  
> -0.824 0.410808
> > free.agent.eligible1:double     -1.151e-02  1.663e-02  
> -0.692 0.489334
> > arbitr.elgible.1:double          2.169e-03  1.938e-02   
> 0.112 0.910985
> > strike.out:triple               -8.106e-04  6.023e-04  
> -1.346 0.179475
> > batting:triple                   5.179e-01  5.599e-01   
> 0.925 0.355841
> > walk:triple                      8.755e-04  9.262e-04   
> 0.945 0.345349
> > hit:hr                          -3.320e-04  2.626e-04  
> -1.264 0.207180
> > rbi:hr                           4.748e-04  3.015e-04   
> 1.575 0.116414
> > free.agent.eligible1:hr          1.840e-02  2.313e-02   
> 0.796 0.426972
> > free.agent.19911:hr              7.216e-02  1.889e-02   
> 3.819 0.000165 ***
> > arbitr.elgible.1:hr              4.111e-02  2.803e-02   
> 1.467 0.143564
> > arbitr.19911:hr                 -2.368e-02  4.647e-02  
> -0.510 0.610723
> > hit:walk                         3.173e-05  7.826e-05   
> 0.405 0.685442
> > free.agent.eligible1:walk       -5.423e-03  4.984e-03  
> -1.088 0.277472
> > rbi:walk                        -7.569e-05  1.313e-04  
> -0.577 0.564598
> > rbi:stolen.base                  3.980e-05  1.605e-04   
> 0.248 0.804409
> > strike.out:stolen.base          -2.611e-04  1.615e-04  
> -1.617 0.107004
> > batting:stolen.base              1.552e-01  1.434e-01   
> 1.082 0.280020
> > arbitr.elgible.1:error           3.930e-03  1.390e-02   
> 0.283 0.777495
> > ---
> > Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
> >
> > Residual standard error: 0.4925 on 280 degrees of freedom
> > Multiple R-Squared: 0.854,      Adjusted R-squared: 0.8248
> > F-statistic: 29.24 on 56 and 280 DF,  p-value: < 2.2e-16
> >
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
>




More information about the R-help mailing list