model.matrix() (PR#285)

Bill Venables William.Venables@cmis.CSIRO.AU
Wed, 22 Sep 1999 22:17:38 +1000


> I was alarmed to discover that model.matrix.default() can permute columns
> with respect to the formula.						

You are very easily alarmed, then.  The order of the terms is
relevant to a sequential anova and in such an anova it is a
normal and very reasonable strategy for main effects to be
included in the model before interactions (and it is only
interaction terms that are displaced, arranging them in
increasing order by degree).

> This seems to happen with user-defined components of the
> formula.  Thus
> 
> X <- matrix(1:4, 1, 4, dimnames = list(NULL, LETTERS[1:4]))
> Q <- function(x) x^2 # because model.matrix() does not like, eg, A:A
> 
> model.matrix(~ -1 + A + A:B + Q(C), data.frame(X))
> 
> has columns ordered A, Q(C), and A.B, not A, A.B, Q(C), as you might
> expect (and I certainly expected!). This appears to happen .Internal-ly. 

There is a standard method common to R and S of holding terms
together in the strict order they are generated that I suggest is
adequate.  Your model matrix, order intact, could be generated
using

> model.matrix(terms(~ -1 + A + A:B + I(C^2), keep.order = T), data.frame(X))

This is now a very standard idiom but not often needed.  Your
function Q() is not needed, by the way.  The I() function
(`identity') allows you to include arbitrary terms of this form.
This is also a standard and well known idiom.  (I won't ask in
what context you need a quadratic term in C but no linear term...)

There is, though, a clear improvement possible and I do suggest R
core consider it sympathetically.  For quantitative terms A:A
should be an allowable form for the quadratic term, but more
generally ~(x1 + x2 + x3 + x4)^3 should be an allowable form for
a complete 3rd order model, including powers.  I know this is
diffucult since it requires parsing and simplification to be
suspended until the classes of the terms are known, but it is
well worth the effort.

> While on the subject, would it be reasonable to standardise the names of
> interaction terms to be A:B not A.B, to match the terms in the formula.

I agree it would indeed.  The `dot' form is potentially ambiguous
but more seriously I get the horrible feeling I am back using
glim once more and it's plain creepy...

Bill Venables.

-- 
-----------------------------------------------------------------
Bill Venables, Statistician, CMIS Environmetrics Project.

Physical address:                            Postal address:
CSIRO Marine Laboratories,                   PO Box 120,       
233 Middle St, Cleveland, Queensland         Cleveland, Qld, 4163
AUSTRALIA                                    AUSTRALIA

Telephone: +61 7 3826 7251     Email: Bill.Venables@cmis.csiro.au     
      Fax: +61 7 3826 7304
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._