# [R] Two factor ANOVA with lm()

Trenkler, Dietrich dtrenkler at nts6.oec.uni-osnabrueck.de
Mon Aug 23 13:00:16 CEST 2004

The following is a data frame

> "jjd" <- structure(list(Observations = c(6.8, 6.6, 5.3, 6.1,
7.5, 7.4, 7.2, 6.5, 7.8, 9.1, 8.8, 9.1), LevelA = structure(c(1,
1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3), .Label = c("A1", "A2",
"A3"), class = "factor"), LevelB = structure(c(1, 1, 2, 2,
1, 1, 2, 2, 1, 1, 2, 2), .Label = c("B1", "B2"), class = "factor")),
.Names = c("Observations", "LevelA", "LevelB"), row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"),
class = "data.frame")

representing data from

@BOOK{Dobson02,
author = {Annette J. Dobson},
year = 2002,
title = {An Introduction to Generalized Linear Models},
edition = {2.},
publisher = {Chapman \& Hall/CRC},
address = {Boca Raton, Florida, 33431}
}

page 101. To reproduce the estimates c(6.7,0.75,1.75,-1.0,0.4,1.5)
given on page 103 in a two factor ANOVA  entering

> jja1 <- lm(Observations~LevelA*LevelB,data=jjd)
> summary(jja1)

I get

Call:
lm(formula = Observations ~ LevelA * LevelB, data = jjd)

Residuals:
Min         1Q     Median         3Q        Max
-6.500e-01 -2.000e-01 -3.469e-17  2.000e-01  6.500e-01

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)         6.7000     0.3512  19.078 1.34e-06 ***
LevelAA2            0.7500     0.4967   1.510   0.1818
LevelAA3            1.7500     0.4967   3.524   0.0125 *
LevelBB2           -1.0000     0.4967  -2.013   0.0907 .
LevelAA2:LevelBB2   0.4000     0.7024   0.569   0.5897
LevelAA3:LevelBB2   1.5000     0.7024   2.136   0.0766 .
---
Signif. codes:  0 ***' 0.001 **' 0.01 *' 0.05 .' 0.1  ' 1

Residual standard error: 0.4967 on 6 degrees of freedom
Multiple R-Squared: 0.9065,     Adjusted R-squared: 0.8286
F-statistic: 11.64 on 5 and 6 DF,  p-value: 0.00481

This is fine. But why do I get these estimates?

Entering

> model.matrix(jja1)

delivers

(Intercept) LevelAA2 LevelAA3 LevelBB2 LevelAA2:LevelBB2
LevelAA3:LevelBB2
1            1        0        0        0                 0
0
2            1        0        0        0                 0
0
3            1        0        0        1                 0
0
4            1        0        0        1                 0
0
5            1        1        0        0                 0
0
6            1        1        0        0                 0
0
7            1        1        0        1                 1
0
8            1        1        0        1                 1
0
9            1        0        1        0                 0
0
10           1        0        1        0                 0
0
11           1        0        1        1                 0
1
12           1        0        1        1                 0
1
attr(,"assign")
[1] 0 1 1 2 3 3
attr(,"contrasts")
attr(,"contrasts")$LevelA [1] "contr.treatment" attr(,"contrasts")$LevelB
[1] "contr.treatment"

which shows that internally lm() seems to use corner point constraints
of the form

$\alpha_1=\beta_1=(\alpha\beta)_{11}= (\alpha\beta)_{12}=(\alpha\beta)_{12}=(\alpha\beta)_{31}=0$

in the model $E[Y_{jkl}]=\mu+\alpha_j+\beta_k+(\alpha\beta)_{jk}$
$j=1,2,3$, $k=1,2$, $l=1,2$, Dobson, page 102.

My question is:  how can I incorporate restrictions like
$\alpha_1+\alpha_2+\alpha_3=0$, $\beta_1+\beta_2=0$,
$(\alpha\beta)_{21}+\alpha\beta)_{22}=0$,
$(\alpha\beta)_{31}+(\alpha\beta)_{32}=0$ and
$(\alpha\beta)_{11}+(\alpha\beta)_{21}+(\alpha\beta)_{31}=0$ from the
outset?  Or put another way:  Why is it that lm() uses the corner point
constraints by default?  Where can I find a documentation for this
behavior?

I know that I can use something like lm(y~X) where y <- c(6.8, 6.6,
5.3, 6.1, 7.5, 7.4, 7.2, 6.5, 7.8, 9.1, 8.8, 9.1) and X is an
appropriate design matrix.  But I wonder if there is a more direct way.

D. Trenkler

--
Dietrich Trenkler   Universität Osnabrück
FB Wirtschaftswissenschaften
Rolandstr.8              D-49069 Osnabrück

dtrenkler at nts6.oec.uni-osnabrueck.de

`