[R] Why terms are dropping out of an lm() model
John Pitney
john at pitney.org
Thu Aug 26 19:54:12 CEST 2004
Hi all!
I'm fairly new to R and not too experienced with regression. Because
of one or both of those traits, I'm not seeing why some terms are being
dropped from my model when doing a regression using lm().
I am trying to do a regression on some experimental data d, which has
two numeric predictors, p1 and p2, and one numeric response, r. The aim
is to compare polynomial models in p1 and p2 up to third order. I don't
understand why lm() doesn't return coefficients for the p1^3 and p2^3
terms. Similar loss of terms happened when I tried orthonormal
polynomials to third order.
I'm satisfied with the second-order regression, by the way, but I'd
still like to understand why the third-order regression doesn't work
like I'd expect.
Can anyone offer a pointer to help me understand this?
Here's what I'm seeing in R 1.9.1 for Windows. Note the NA's for p1^3
and p2^3 in the last summary.
> d <- read.csv("DOE_data.csv")
> d$p1
[1] 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 0 0 0
[34] 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 0 0 0 0 0 0
[67] 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2
> d$p2
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
[34] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2
[67] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
> summary(d$r)
Min. 1st Qu. Median Mean 3rd Qu. Max.
18.68 19.88 21.94 21.48 22.64 24.36
> summary(d.lm1 <- lm(r ~ p1 + p2, data=d))
Call:
lm(formula = r ~ p1 + p2, data = d)
Residuals:
Min 1Q Median 3Q Max
-0.58107 -0.09248 0.02492 0.26061 0.49617
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 18.66417 0.06591 283.17 <2e-16 ***
p1 1.96145 0.04036 48.60 <2e-16 ***
p2 0.85801 0.04036 21.26 <2e-16 ***
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Residual standard error: 0.3126 on 87 degrees of freedom
Multiple R-Squared: 0.97, Adjusted R-squared: 0.9693
F-statistic: 1407 on 2 and 87 DF, p-value: < 2.2e-16
> summary(d.lm2 <- update(d.lm1, . ~ . + I(p1^2) + I(p2^2) + I(p1 * p2)))
Call:
lm(formula = r ~ p1 + p2 + I(p1^2) + I(p2^2) + I(p1 * p2), data = d)
Residuals:
Min 1Q Median 3Q Max
-0.106813 -0.021568 0.003214 0.025083 0.084687
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 18.701098 0.011265 1660.14 <2e-16 ***
p1 2.674525 0.019511 137.08 <2e-16 ***
p2 0.984765 0.019511 50.47 <2e-16 ***
I(p1^2) -0.489210 0.008875 -55.12 <2e-16 ***
I(p2^2) -0.196050 0.008875 -22.09 <2e-16 ***
I(p1 * p2) 0.265345 0.006275 42.28 <2e-16 ***
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Residual standard error: 0.03969 on 84 degrees of freedom
Multiple R-Squared: 0.9995, Adjusted R-squared: 0.9995
F-statistic: 3.598e+04 on 5 and 84 DF, p-value: < 2.2e-16
> summary(d.lm3 <- update(d.lm2, . ~ . + I(p1^3) + I(p2^3) + I(p1^2*p2) +
I(p1*p2^2)))
Call:
lm(formula = r ~ p1 + p2 + I(p1^2) + I(p2^2) + I(p1 * p2) + I(p1^3) +
I(p2^3) + I(p1^2 * p2) + I(p1 * p2^2), data = d)
Residuals:
Min 1Q Median 3Q Max
-0.089823 -0.017707 0.001952 0.020820 0.059302
Coefficients: (2 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 18.728958 0.009657 1939.365 < 2e-16 ***
p1 2.604190 0.022970 113.376 < 2e-16 ***
p2 0.860080 0.022970 37.444 < 2e-16 ***
I(p1^2) -0.463725 0.010950 -42.348 < 2e-16 ***
I(p2^2) -0.137955 0.010950 -12.598 < 2e-16 ***
I(p1 * p2) 0.432505 0.024486 17.664 < 2e-16 ***
I(p1^3) NA NA NA NA
I(p2^3) NA NA NA NA
I(p1^2 * p2) -0.025485 0.008482 -3.005 0.00353 **
I(p1 * p2^2) -0.058095 0.008482 -6.849 1.26e-09 ***
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Residual standard error: 0.03097 on 82 degrees of freedom
Multiple R-Squared: 0.9997, Adjusted R-squared: 0.9997
F-statistic: 4.221e+04 on 7 and 82 DF, p-value: < 2.2e-16
Thanks and best regards,
John
More information about the R-help
mailing list