[R] Linear Dependance of Model Matrix and How Fitted/ Sums of Squares Follow

Justin Thong justinthong93 at gmail.com
Tue Jul 26 17:05:31 CEST 2016


Below is the covariates for a model ~x1+x2+x3+x4+x5+x6. I noticed that when
fitting this model that the coefficient x6 is unestimable.*Is this merely a
case that adding more columns to my model matrix will eventually lead to
linear dependance so the more terms I have in the model formulae the more
likely the model matrix becomes linearly dependant?*   I found that for
model formulae ~x1+x2+x3+x4+x5, all the coefficients are estimable so I
guess this example supports my statement.

But that being said, since not all coefficients are estimated then how does
R compute the fitted values and anova table. *Does it just ignore the
existence of x6 and consider the model to be ~x1+x2+x3+x4+x5? Or is there
something deeper that I do not understand.* Because the sums of squares and
fitted seem to be the same for model ~x1+x2+x3+x4+x5 as it is for
~x1+x2+x3+x4+x5+x6

However, this is not so clear cut for model with factors. Because factors
are only represented by a parameter for each level in the model matrix.
Consider factor F with 2 levels and G with 3 levels. The problem is that R
has a way of excluding certain rows from the anova table. Again, it can be
seen that it excludes the rows associated with the parameters which are not
estimable, but this is not absolutely clear in my mind.Look at small
example below for model ~F*G for two factors. As you can see, the
interaction parameters are not estimable ie F2:G2 and F2:G3. Now from what
I was told, F1 and G1 is contained within the (Intercept) parameter so
F1:G1, F1:G2, F2:G1 are not considered. You can see from the anova table
that the interaction row F:G is ignored. My main problem is why is it
ignored.
*Does that mean that if all the parameters (excluding the ones asasociated
with intercept) that is associated with a particular term is unestimable
then the row of that term in the anova table is ignored? How many
 unestimable parameters must there be for the row of a term to be
ignored? *Because
If the answer to the second question is to calculate fitted values and sums
of squares by ignoring unestimable parameters, then it means that the rows
of sums of squares disappear for a different reason other than
unestimability.

Sorry for the generally wordy question. I may not be thinking of it in the
correct manner and I would appreciate if anyone has an answer and perhaps
even some generalisations towards the use of QR decomposition.

(There is more code below this data)

 x1 x2 x3 x4 x5 x6
1   12  0  0  0  0  0
2   12  0  0  0  0  0
3   12  0  0  0  0  0
4   12  0  0  0  0  0
5    0 12  0  0  0  0
6    0 12  0  0  0  0
7    0 12  0  0  0  0
8    0 12  0  0  0  0
9    0  0 12  0  0  0
10   0  0 12  0  0  0
11   0  0 12  0  0  0
12   0  0 12  0  0  0
13   0  0  0 12  0  0
14   0  0  0 12  0  0
15   0  0  0 12  0  0
16   0  0  0 12  0  0
17   0  0  0  0 12  0
18   0  0  0  0 12  0
19   0  0  0  0 12  0
20   0  0  0  0 12  0
21   0  0  0  0  0 12
22   0  0  0  0  0 12
23   0  0  0  0  0 12
24   0  0  0  0  0 12
25   6  6  0  0  0  0
26   6  6  0  0  0  0
27   6  6  0  0  0  0
28   6  6  0  0  0  0
29   6  0  6  0  0  0
30   6  0  6  0  0  0
31   6  0  6  0  0  0
32   6  0  6  0  0  0
33   6  0  0  6  0  0
34   6  0  0  6  0  0
35   6  0  0  6  0  0
36   6  0  0  6  0  0
37   6  0  0  0  6  0
38   6  0  0  0  6  0
39   6  0  0  0  6  0
40   6  0  0  0  6  0
41   6  0  0  0  0  6
42   6  0  0  0  0  6
43   6  0  0  0  0  6
44   6  0  0  0  0  6
45   0  6  6  0  0  0
46   0  6  6  0  0  0
47   0  6  6  0  0  0
48   0  6  6  0  0  0
49   0  6  0  6  0  0
50   0  6  0  6  0  0
51   0  6  0  6  0  0
52   0  6  0  6  0  0
53   0  6  0  0  6  0
54   0  6  0  0  6  0
55   0  6  0  0  6  0
56   0  6  0  0  6  0
57   0  6  0  0  0  6
58   0  6  0  0  0  6
59   0  6  0  0  0  6
60   0  6  0  0  0  6
61   0  0  6  6  0  0
62   0  0  6  6  0  0
63   0  0  6  6  0  0
64   0  0  6  6  0  0
65   0  0  6  0  6  0
66   0  0  6  0  6  0
67   0  0  6  0  6  0
68   0  0  6  0  6  0
69   0  0  6  0  0  6
70   0  0  6  0  0  6
71   0  0  6  0  0  6
72   0  0  6  0  0  6
73   0  0  0  6  6  0
74   0  0  0  6  6  0
75   0  0  0  6  6  0
76   0  0  0  6  6  0
77   0  0  0  6  0  6
78   0  0  0  6  0  6
79   0  0  0  6  0  6
80   0  0  0  6  0  6
81   0  0  0  0  6  6
82   0  0  0  0  6  6
83   0  0  0  0  6  6
84   0  0  0  0  6  6
85   4  4  4  0  0  0
86   4  4  4  0  0  0
87   4  4  4  0  0  0
88   4  4  4  0  0  0
89   4  4  0  4  0  0
90   4  4  0  4  0  0
91   4  4  0  4  0  0
92   4  4  0  4  0  0
93   4  4  0  0  4  0
94   4  4  0  0  4  0
95   4  4  0  0  4  0
96   4  4  0  0  4  0
97   4  4  0  0  0  4
98   4  4  0  0  0  4
99   4  4  0  0  0  4
100  4  4  0  0  0  4
101  4  0  4  4  0  0
102  4  0  4  4  0  0
103  4  0  4  4  0  0
104  4  0  4  4  0  0
105  4  0  4  0  4  0
106  4  0  4  0  4  0
107  4  0  4  0  4  0
108  4  0  4  0  4  0
109  4  0  4  0  0  4
110  4  0  4  0  0  4
111  4  0  4  0  0  4
112  4  0  4  0  0  4
113  4  0  0  4  4  0
114  4  0  0  4  4  0
115  4  0  0  4  4  0
116  4  0  0  4  4  0
117  4  0  0  4  0  4
118  4  0  0  4  0  4
119  4  0  0  4  0  4
120  4  0  0  4  0  4
121  4  0  0  0  4  4
122  4  0  0  0  4  4
123  4  0  0  0  4  4
124  4  0  0  0  4  4
125  0  4  4  4  0  0
126  0  4  4  4  0  0
127  0  4  4  4  0  0
128  0  4  4  4  0  0
129  0  4  4  0  4  0
130  0  4  4  0  4  0
131  0  4  4  0  4  0
132  0  4  4  0  4  0
133  0  4  4  0  0  4
134  0  4  4  0  0  4
135  0  4  4  0  0  4
136  0  4  4  0  0  4
137  0  4  0  4  4  0
138  0  4  0  4  4  0
139  0  4  0  4  4  0
140  0  4  0  4  4  0
141  0  4  0  4  0  4
142  0  4  0  4  0  4
143  0  4  0  4  0  4
144  0  4  0  4  0  4
145  0  4  0  0  4  4
146  0  4  0  0  4  4
147  0  4  0  0  4  4
148  0  4  0  0  4  4
149  0  0  4  4  4  0
150  0  0  4  4  4  0
151  0  0  4  4  4  0
152  0  0  4  4  4  0
153  0  0  4  4  0  4
154  0  0  4  4  0  4
155  0  0  4  4  0  4
156  0  0  4  4  0  4
157  0  0  4  0  4  4
158  0  0  4  0  4  4
159  0  0  4  0  4  4
160  0  0  4  0  4  4
161  0  0  0  4  4  4
162  0  0  0  4  4  4
163  0  0  0  4  4  4
164  0  0  0  4  4  4

*F<- factor(c(rep(1,3),rep(2,3)))*
*G<- factor(c(rep(1,2),rep(2,2),rep(3,2)))*
*H<-F<- factor(c(rep(1,3),rep(2,3)))*
*y<-rnorm(6,2)*
*test3<-aov(y~F*G)*

*model.matrix(test3)*

 (Intercept) F2 G2 G3 F2:G2 F2:G3
1              1  0  0     0     0     0
2              1  0  0     0     0     0
3              1  0  1     0     0     0
4              1  1  1     0     1     0
5              1  1  0     1     0     1
6              1  1  0     1     0     1
attr(,"assign")
[1] 0 1 2 2 3 3
attr(,"contrasts")
attr(,"contrasts")$F
[1] "contr.treatment"

attr(,"contrasts")$G
[1] "contr.treatment"

*alias(test3)*

Model :
y ~ F * G

Complete :
      (Intercept) F2 G2 G3
F2:G2  0           1  0 -1
F2:G3  0           0  0  1

*summary(test3)*

                  Df Sum Sq Mean Sq F value Pr(>F)
F                1   0.0479  0.0479   0.059  0.830
G                2  0.9762  0.4881   0.604  0.624
Residuals   2  1.6175  0.8087
-- 
Yours sincerely,
Justin

*I check my email at 9AM and 4PM everyday*
*If you have an EMERGENCY, contact me at +447938674419
<%2B447938674419>(UK) or +60125056192 <%2B60125056192>(Malaysia)*

	[[alternative HTML version deleted]]



More information about the R-help mailing list