[R] formula behaviour in model.matrix

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri Feb 11 18:43:50 CET 2005


See MASS4 pp.149-150 (and I don't know of a similarly detailed explanation
elsewhere, although there is a terser account in the White Book).

On Fri, 11 Feb 2005, Sundar Dorai-Raj wrote:

> Hi all,
>
> Perhaps somebody can explain the following behaviour to me.
>
> Take the following data.frame.
>
> z <- expand.grid(X = LETTERS[1:3], Y = letters[1:3])
>
> Now, from ?formula we see:
>
> <quote>
> The '*' operator denotes factor crossing: 'a*b' interpreted as 'a+b+a:b'.
> </quote>
>
> So I would expect the following:
>
> ncol(model.matrix(~X*Y, z)) # returns 1 + 2 + 2 + 2 * 2 = 9
>
> and
>
> ncol(model.matrix(~X + Y + X:Y, z)) # returns 1 + 2 + 2 + 2 * 2 = 9
>
> are equivalent.
>
> However, I did not expect this:
>
> ncol(model.matrix(~X:Y, z)) # returns 1 + 3 * 3 = 10
>
> Why isn't this 5? In other words, why doesn't "~X:Y" just denote the
> interaction term so that all you would get is an intercept plus the
> two-way interaction between X and Y (1 + 2 * 2 = 5 parameters)? Instead
> what is returned is the fully crossed effects (every level of X against
> every level of Y) plus an intercept. Is there something in the
> documentation I'm missing?
>
> --sundar
>
> P.S. This behaviour is identical in S-PLUS 6.2.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list