[R] Question about variable selection

John Fox jfox at mcmaster.ca
Sat Feb 18 20:35:05 CET 2006


Dear Wensui and Andy,

When the explanatory variables are correlated it's perfectly possible for
the marginal relationship between and X and Y to be zero and a partial
relationship nonzero (even in the absence of interactions) -- this is simply
a reflection of the more general point that partial and marginal
relationships can differ.

Regards,
 John

--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
-------------------------------- 

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Wensui Liu
> Sent: Saturday, February 18, 2006 2:03 PM
> To: Liaw, Andy
> Cc: r-help at stat.math.ethz.ch
> Subject: Re: [R] Question about variable selection
> 
> Thank you so much for your reply, Andy.
> 
> But what if I am only interesed in main effects instead of 
> interactions?
> 
> 
> 
> On 2/18/06, Liaw, Andy <andy_liaw at merck.com> wrote:
> >
> > That depends on whether the IV could have some significant 
> > interactions with other Ivs not considered in the bivariate 
> analysis.  
> > E.g.,
> >
> > > iv <- expand.grid(-2:2, -2:2)
> > > y <- 3 + iv[,1] * iv[,2] + rnorm(nrow(iv), sd=0.1) summary(lm(y ~ 
> > > iv[,1]))
> >
> > Call:
> > lm(formula = y ~ iv[, 1])
> >
> > Residuals:
> >      Min       1Q   Median       3Q      Max
> > -4.06259 -1.06048 -0.02377  1.05901  4.04315
> >
> > Coefficients:
> >             Estimate Std. Error t value Pr(>|t|)
> > (Intercept)  3.01908    0.41482   7.278 2.09e-07 ***
> > iv[, 1]      0.01417    0.29332   0.048    0.962
> > ---
> > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> >
> > Residual standard error: 2.074 on 23 degrees of freedom Multiple 
> > R-Squared: 0.0001014,  Adjusted R-squared: -0.04337
> > F-statistic: 0.002333 on 1 and 23 DF,  p-value: 0.9619
> >
> > > summary(lm(y ~ iv[,1] * iv[,2]))
> >
> > Call:
> > lm(formula = y ~ iv[, 1] * iv[, 2])
> >
> > Residuals:
> >      Min       1Q   Median       3Q      Max
> > -0.22390 -0.08894 -0.01279  0.13525  0.17608
> >
> > Coefficients:
> >                  Estimate Std. Error t value Pr(>|t|)
> > (Intercept)      3.019083   0.026330 114.665   <2e-16 ***
> > iv[, 1]          0.014167   0.018618   0.761    0.455
> > iv[, 2]         -0.005486   0.018618  -0.295    0.771
> > iv[, 1]:iv[, 2]  0.992865   0.013165  75.418   <2e-16 ***
> > ---
> > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> >
> > Residual standard error: 0.1316 on 21 degrees of freedom
> > Multiple R-Squared: 0.9963,     Adjusted R-squared: 0.9958
> > F-statistic:  1896 on 3 and 21 DF,  p-value: < 2.2e-16
> >
> >
> >
> >
> > Andy
> >
> > From: Wensui Liu
> > >
> > > Dear Lister,
> > >
> > > I have a question about variable selection for regression.
> > >
> > > if the IV is not significantly related to DV in the bivariate 
> > > analysis, does it make sense to include this IV into the 
> full model 
> > > with multiple IVs?
> > >
> > > Thank you so much!
> > >
> > >       [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list 
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide!
> > > http://www.R-project.org/posting-guide.html
> > >
> > >
> >
> >
> >
> > 
> ----------------------------------------------------------------------
> > --------
> > Notice:  This e-mail message, together with any 
> > attachment...{{dropped}}
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html




More information about the R-help mailing list