[R] Contrast matrices for nested factors

Tue Sep 7 05:01:49 CEST 2004

        Hi, I'd like to know if it's possible to specify different
contrast matrices in lm() for a factor that is nested within another one. This
is useful when we have a model where the nested factor has a different
number of levels, depending on the main factor.

        Let me illustrate with an example to make it clearer. Consider
the following data set:

        set.seed(1)
        y <- rnorm(14)
        a <- factor(c(rep(1,7),rep(2,3),rep(3,4)))
        b <- factor(c(1,1,1,2,2,3,3,1,1,2,1,1,2,2))
        k <- factor(c(1,2,3,1,2,1,2,1,2,1,1,2,1,2))
        internal <- data.frame(y,a,b,k)

        Where y is an arbitrary response, a is a main factor, b is a
factor nested within a, and k is the replicate number. It is easy to see
that depending on the level of a, b has different numbers of levels. For
instance, when a = 1, we have that b might assume values 1, 2 or 3,
while a = 2 or 3, b might assume only 1 or 2.

        I'd like then to use contrasts summing to 0, so I issue:

        z <- lm(y ~ a + a/b,data=internal,contrasts=list(a=contr.sum,
b=contr.sum))

        The problem is, the design matrix is not quite what I expected.
What happens is, instead of using a different contrast matrix for each
level of a where b is nested, it's using the same contrast matrix for
every b, namely:

        > contr.sum(3)
  [,1] [,2]
1    1    0
2    0    1
3   -1   -1

        So, when a=1, the columns of the design matrix are as expected.
It sums to 0, because there are levels of b 1, 2 and 3, when a=1. But,
when a=2 or a=3, the same contrast matrix is being used, and then, the
factor effects do not sum to 0. That's obviously because there are no
 values for b equal 3, when a != 1, and then the coding that gets done is
 '0' or '1'.

        The design matrix lm() is creating is:

> model.matrix(z)
   (Intercept) a1 a2 a1:b1 a2:b1 a3:b1 a1:b2 a2:b2 a3:b2
1            1  1  0     1     0     0     0     0     0
2            1  1  0     1     0     0     0     0     0
3            1  1  0     1     0     0     0     0     0
4            1  1  0     0     0     0     1     0     0
5            1  1  0     0     0     0     1     0     0
6            1  1  0    -1     0     0    -1     0     0
7            1  1  0    -1     0     0    -1     0     0
8            1  0  1     0     1     0     0     0     0
9            1  0  1     0     1     0     0     0     0
10           1  0  1     0     0     0     0     1     0
11           1 -1 -1     0     0     1     0     0     0
12           1 -1 -1     0     0     1     0     0     0
13           1 -1 -1     0     0     0     0     0     1
14           1 -1 -1     0     0     0     0     0     1


        What I would like to use is:

   (Intercept) a1 a2 a1:b1 a2:b1 a3:b1 a1:b2    
1            1  1  0     1     0     0     0 0 0
2            1  1  0     1     0     0     0 0 0
3            1  1  0     1     0     0     0 0 0
4            1  1  0     0     0     0     1 0 0
5            1  1  0     0     0     0     1 0 0
6            1  1  0    -1     0     0    -1 0 0
7            1  1  0    -1     0     0    -1 0 0
8            1  0  1     0     1     0     0 0 0
9            1  0  1     0     1     0     0 0 0
10           1  0  1     0    -1     0     0 0 0
11           1 -1 -1     0     0     1     0 0 0
12           1 -1 -1     0     0     1     0 0 0
13           1 -1 -1     0     0    -1     0 0 0
14           1 -1 -1     0     0    -1     0 0 0

        (notice that in the second matrix all collumns sum to 0, in the
first they don't).


        Thank you,

--
Fernando Henrique Ferraz P. da Rosa
http://www.ime.usp.br/~feferraz