[R] Varying statistical significance in estimates of linear model

Stathis Kamperis ekamperi at gmail.com
Thu Aug 8 12:43:27 CEST 2013


Hi everyone,

I have a response variable 'y' and several predictor variables 'x_i'.
I start with a linear model:

m1 <- lm(y ~ x1); summary(m1)

and I get a statistically significant estimate for 'x1'. Then, I
modify my model as:

m2 <- lm(y ~ x1 + x2); summary(m2)

At this moment, the estimate for x1 might become non-significant while
the estimate of x2 significant.

As I add more predictor variables (or interaction terms), the
estimates for which I get a statistically significant result vary. So
sometimes x1, x2, x6 are significant, while others, x2, x4, x5 are.

It seems to me that I could tweak my model in such a way (by
adding/removing predictor variables or "suitable" interaction terms)
that I could "prove" whatever I'd like to prove.

What is the proper methodology involved here ? What do you people do
in such cases ? I can provide the data if anyone cares and would like
to have a look at them.

Best regards,
Stathis Kamperis



More information about the R-help mailing list