[R] lm with a single X and step with several Xi-s, beta coef. quite different:

Ista Zahn istazahn at gmail.com
Wed Aug 8 15:49:49 CEST 2012


Hi,

Sounds like suppression -- see e.g.,
http://www.jstor.org/stable/2988294?seq=1 for a discussion.

Since this is not an R question but a statistical one, it may be more
appropriate to post this question to a statistics forum such as
http://stats.stackexchange.com/

Best,
Ista

On Tue, Aug 7, 2012 at 3:28 PM, Aldi Kraja <aldi at wustl.edu> wrote:
> Hi, (R version 2.15.0)
> I am running a pgm with 1 response (earlier standardized Y) and 44
> independent vars (Xi) from the same data =a2:
> When I run the 'lm' function on single Xi at a time, the beta coefficient
> for let's say X1 is = -0.08 (se=0.03256)
> But when I run the same Y with 44 Xi-s with the 'step' function (because I
> left direction parameter empty, I assume a backward multiple reg is
> implemented), 12 Xia-a remain in the final model where X1 is still present,
> the X1 beta coefficient becomes = --0.43402 (se=0.06847)
>
> I did not expect such a drastic change (4 times smaller) in the beta coeff.
> from "lm" with X1 (bx1=-0.08) to "step" with final 12 Xis including X1
> (bx1=--0.43402).
> I understand that step function is producing partial reg coeff, when all
> other Xi-s are held constant, but is there any good reason why X1 in a
> multivariate reg. can become so significant (from lm px1=0.00296 ** to step
> px1=2.55e-10 ***)?
>
> Some of the 44 Xi-s are correlated to each other, but I am hoping that
> stepwise reg will drop some of those correlated ones.
> The Xi-s represent variables coded numerically as 0,1,2 to apply a linear
> regression on them.
> For example the frequency of X1 is:
> [1] x1
> Levels: x1
> 0 1 2
> 3459 985 96
>
> output of lm(Y ~ X1):
> ==================
>> obj1<-lm(y ~ x1, data=a2)
>> summary(obj1)
>
> Call:
> lm(formula = y ~ x1, data = a2)
>
> Residuals:
> Min 1Q Median 3Q Max
> -3.3418 -0.7240 -0.0462 0.6577 4.2929
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 0.03635 0.01781 2.042 0.04124 *
> x1 -0.09682 0.03256 -2.973 0.00296 **
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 1.024 on 4255 degrees of freedom
> Multiple R-squared: 0.002074, Adjusted R-squared: 0.001839
> F-statistic: 8.842 on 1 and 4255 DF, p-value: 0.002961
>
> output from the step function on 44 Xi-s:
> ====================================
> a2 <-na.omit(ac16g761[,3:(44+2+1)])
> lm.a2<-lm(y ~ ., data=a2)
> lm.final <-step(lm.a2,trace=F)
> summary(lm.final)
> Call:
> lm(formula = y ~ x1 + x2 +
> x3 + x4 + x5 + x6 + x7 + x8 +
> x9 + x10 + x11 + x12, data = a2)
>
> Residuals:
> Min 1Q Median 3Q Max
> -3.2955 -0.7210 -0.0611 0.6623 4.1064
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 0.01065 0.02637 0.404 0.686412
> x1 -0.43402 0.06847 -6.339 2.55e-10 ***
> x2 -0.17109 0.11370 -1.505 0.132464
> x3 0.23552 0.11552 2.039 0.041533 *
> x4 -0.19898 0.10133 -1.964 0.049625 *
> x5 0.06653 0.03796 1.752 0.079769 .
> x6 0.18319 0.08592 2.132 0.033070 *
> x7 -0.17443 0.05095 -3.424 0.000624 ***
> x8 0.24013 0.06516 3.685 0.000232 ***
> x9 0.19202 0.08009 2.398 0.016543 *
> x10 -0.17257 0.05576 -3.095 0.001983 **
> x11 -0.23537 0.05704 -4.126 3.75e-05 ***
> x12 0.25992 0.06260 4.152 3.35e-05 ***
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 1.02 on 4244 degrees of freedom
> Multiple R-squared: 0.01353, Adjusted R-squared: 0.01074
> F-statistic: 4.851 on 12 and 4244 DF, p-value: 5.466e-08
>
> Thank you in advance,
>
> Aldi
>
> P.S. Sorry that I cannot distribute these data for a test.
>
> --
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list