[Rd] Bugs? when dealing with contrasts

Berwin A Turlach berwin at maths.uwa.edu.au
Thu Apr 22 05:38:48 CEST 2010


G'day Gabor,

On Wed, 21 Apr 2010 14:00:33 -0400
Gabor Grothendieck <ggrothendieck at gmail.com> wrote:

> Below are two cases where I don't seem to be getting contr.sum
> contrasts even though they were specified.  Are these bugs?

Short answer: no. :)

> The first case is an interaction between continuous and factor
> variables.
> 
> The second case contrasts= was specified as an arg to lm.  The second
> works ok if we set the contrasts through options but not if we set it
> through an lm argument.
> 
> >
> > # 1. In this case I don't seem to be getting contr.sum contrasts:
> >
> > options(contrasts = c("contr.sum", "contr.poly"))
> > getOption("contrasts")
> [1] "contr.sum"  "contr.poly"
> > scores <- rep(seq(-2, 2), 3); scores
>  [1] -2 -1  0  1  2 -2 -1  0  1  2 -2 -1  0  1  2
> > fac <- gl(3, 5); fac
>  [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
> Levels: 1 2 3
> >
> > # I get this:
> > model.matrix(~ scores:fac)
>    (Intercept) scores:fac1 scores:fac2 scores:fac3
> 1            1          -2           0           0
> 2            1          -1           0           0
> 3            1           0           0           0
> 4            1           1           0           0
> 5            1           2           0           0
> 6            1           0          -2           0
> 7            1           0          -1           0
> 8            1           0           0           0
> 9            1           0           1           0
> 10           1           0           2           0
> 11           1           0           0          -2
> 12           1           0           0          -1
> 13           1           0           0           0
> 14           1           0           0           1
> 15           1           0           0           2
> attr(,"assign")
> [1] 0 1 1 1
> attr(,"contrasts")
> attr(,"contrasts")$fac
> [1] "contr.sum"

When creating the model matrix, the various levels of a factor are
first coded as indicator variables (for creation of the main effects
and for creation of interactions).  The contrast matrix is then
applied to remove redundant columns from the design matrix; that is,
columns that are known to be redundant due to the *design* (i.e. model
formula as a whole), depending on whether data is available for all
levels of a factor (or combinations of levels if several factors are in
the model) you can still end up with a design matrix that does not have
full column rank.

In this case, since there is no main effect fac, the three columns that
are put into the design matrix due to the score:fac term do not create
a singularity, hence the contrast matrix is not applied to these three
columns.

The above description is probably a bit rough and lacking in detail (if
not even wrong in some details).  IMHO, the best explanation of how R
goes from the model formula to the actual design matrix (for linear
models) can still be found in MASS; though, if I am not mistaken,
current versions of R seem to proceed slightly different to that
description in more complicated models (i.e. several factors and
continuous explanatory variables).

> > # But I was expecting this since I am using contr.sum
> > cbind(1, model.matrix(~ fac)[,2:3] * scores)
>      fac1 fac2
> 1  1   -2    0
> 2  1   -1    0
> 3  1    0    0
> 4  1    1    0
> 5  1    2    0
> 6  1    0   -2
> 7  1    0   -1
> 8  1    0    0
> 9  1    0    1
> 10 1    0    2
> 11 1    2    2
> 12 1    1    1
> 13 1    0    0
> 14 1   -1   -1
> 15 1   -2   -2

If you wish to have the design matrix that contains the interaction
terms as if the model had the main effects too but without the columns
corresponding to the main effects, then just instruct R that you want
to have such a matrix:

R> model.matrix(~ scores*fac)[,-(2:4)]
   (Intercept) scores:fac1 scores:fac2
1            1          -2           0
2            1          -1           0
3            1           0           0
4            1           1           0
5            1           2           0
6            1           0          -2
7            1           0          -1
8            1           0           0
9            1           0           1
10           1           0           2
11           1           2           2
12           1           1           1
13           1           0           0
14           1          -1          -1
15           1          -2          -2

Though, I am not sure why one wants to fit such a model.  

> > # 2.
> > # here I don't get contr.sum but rather get contr.treatment
> > options(contrasts = c("contr.treatment", "contr.poly"))
> > getOption("contrasts")
> [1] "contr.treatment" "contr.poly"
> > model.matrix(lm(seq(15) ~ fac, contrasts = c("contr.sum",
> > "contr.poly")))
>    (Intercept) fac2 fac3
> 1            1    0    0
> 2            1    0    0
> 3            1    0    0
> 4            1    0    0
> 5            1    0    0
> 6            1    1    0
> 7            1    1    0
> 8            1    1    0
> 9            1    1    0
> 10           1    1    0
> 11           1    0    1
> 12           1    0    1
> 13           1    0    1
> 14           1    0    1
> 15           1    0    1
> attr(,"assign")
> [1] 0 1 1
> attr(,"contrasts")
> attr(,"contrasts")$fac
> [1] "contr.treatment"

No bug either, but wrong use of the optional argument "contrasts" for 
"lm".  Please read the help page of "lm", which points you to the help
page of "model.matrix.default", which contains pertinent examples; in
your case:

R> model.matrix(lm(seq(15) ~ fac, contrasts = list(fac="contr.sum")))  
   (Intercept) fac1 fac2
1            1    1    0
2            1    1    0
3            1    1    0
4            1    1    0
5            1    1    0
6            1    0    1
7            1    0    1
8            1    0    1
9            1    0    1
10           1    0    1
11           1   -1   -1
12           1   -1   -1
13           1   -1   -1
14           1   -1   -1
15           1   -1   -1
attr(,"assign")
[1] 0 1 1
attr(,"contrasts")
attr(,"contrasts")$fac
[1] "contr.sum"

HTH.

Cheers,

	Berwin

========================== Full address ============================
Berwin A Turlach                      Tel.: +61 (8) 6488 3338 (secr)
School of Maths and Stats (M019)            +61 (8) 6488 3383 (self)
The University of Western Australia   FAX : +61 (8) 6488 1028
35 Stirling Highway                   
Crawley WA 6009                e-mail: berwin at maths.uwa.edu.au
Australia                        http://www.maths.uwa.edu.au/~berwin



More information about the R-devel mailing list