[R] Why terms are dropping out of an lm() model

John Pitney john at pitney.org
Thu Aug 26 19:54:12 CEST 2004


Hi all!

I'm fairly new to R and not too experienced with regression.  Because
of one or both of those traits, I'm not seeing why some terms are being
dropped from my model when doing a regression using lm().

I am trying to do a regression on some experimental data d, which has
two numeric predictors, p1 and p2, and one numeric response, r.  The aim
is to compare polynomial models in p1 and p2 up to third order.  I don't
understand why lm() doesn't return coefficients for the p1^3 and p2^3
terms.  Similar loss of terms happened when I tried orthonormal
polynomials to third order.

I'm satisfied with the second-order regression, by the way, but I'd
still like to understand why the third-order regression doesn't work
like I'd expect.

Can anyone offer a pointer to help me understand this?

Here's what I'm seeing in R 1.9.1 for Windows.  Note the NA's for p1^3
and p2^3 in the last summary.

> d <- read.csv("DOE_data.csv")
> d$p1
 [1] 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 0 0 0
[34] 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 0 0 0 0 0 0
[67] 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2
> d$p2
 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
[34] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2
[67] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
> summary(d$r)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  18.68   19.88   21.94   21.48   22.64   24.36
> summary(d.lm1 <- lm(r ~ p1 + p2, data=d))

Call:
lm(formula = r ~ p1 + p2, data = d)

Residuals:
     Min       1Q   Median       3Q      Max
-0.58107 -0.09248  0.02492  0.26061  0.49617

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) 18.66417    0.06591  283.17   <2e-16 ***
p1           1.96145    0.04036   48.60   <2e-16 ***
p2           0.85801    0.04036   21.26   <2e-16 ***
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

Residual standard error: 0.3126 on 87 degrees of freedom
Multiple R-Squared:  0.97,      Adjusted R-squared: 0.9693
F-statistic:  1407 on 2 and 87 DF,  p-value: < 2.2e-16

> summary(d.lm2 <- update(d.lm1, . ~ . + I(p1^2) + I(p2^2) + I(p1 * p2)))

Call:
lm(formula = r ~ p1 + p2 + I(p1^2) + I(p2^2) + I(p1 * p2), data = d)

Residuals:
      Min        1Q    Median        3Q       Max
-0.106813 -0.021568  0.003214  0.025083  0.084687

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) 18.701098   0.011265 1660.14   <2e-16 ***
p1           2.674525   0.019511  137.08   <2e-16 ***
p2           0.984765   0.019511   50.47   <2e-16 ***
I(p1^2)     -0.489210   0.008875  -55.12   <2e-16 ***
I(p2^2)     -0.196050   0.008875  -22.09   <2e-16 ***
I(p1 * p2)   0.265345   0.006275   42.28   <2e-16 ***
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

Residual standard error: 0.03969 on 84 degrees of freedom
Multiple R-Squared: 0.9995,     Adjusted R-squared: 0.9995
F-statistic: 3.598e+04 on 5 and 84 DF,  p-value: < 2.2e-16

> summary(d.lm3 <- update(d.lm2, . ~ . + I(p1^3) + I(p2^3) + I(p1^2*p2) +
I(p1*p2^2)))

Call:
lm(formula = r ~ p1 + p2 + I(p1^2) + I(p2^2) + I(p1 * p2) + I(p1^3) +
    I(p2^3) + I(p1^2 * p2) + I(p1 * p2^2), data = d)

Residuals:
      Min        1Q    Median        3Q       Max
-0.089823 -0.017707  0.001952  0.020820  0.059302

Coefficients: (2 not defined because of singularities)
              Estimate Std. Error  t value Pr(>|t|)
(Intercept)  18.728958   0.009657 1939.365  < 2e-16 ***
p1            2.604190   0.022970  113.376  < 2e-16 ***
p2            0.860080   0.022970   37.444  < 2e-16 ***
I(p1^2)      -0.463725   0.010950  -42.348  < 2e-16 ***
I(p2^2)      -0.137955   0.010950  -12.598  < 2e-16 ***
I(p1 * p2)    0.432505   0.024486   17.664  < 2e-16 ***
I(p1^3)             NA         NA       NA       NA
I(p2^3)             NA         NA       NA       NA
I(p1^2 * p2) -0.025485   0.008482   -3.005  0.00353 **
I(p1 * p2^2) -0.058095   0.008482   -6.849 1.26e-09 ***
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

Residual standard error: 0.03097 on 82 degrees of freedom
Multiple R-Squared: 0.9997,     Adjusted R-squared: 0.9997
F-statistic: 4.221e+04 on 7 and 82 DF,  p-value: < 2.2e-16

Thanks and best regards,
John




More information about the R-help mailing list