[R] Why does the order of terms in a formula translate into different models/ model matrices?

peter dalgaard pdalgd at gmail.com
Sun Jan 29 11:24:52 CET 2012


On Jan 29, 2012, at 02:42 , Ben Bolker wrote:
>> 
> 
> (quoting removed to make Gmane happy)
> 
> AFAICS, this is a bug.
> 
> 
>  I think so too, although I haven't got my head around it yet.
> 
>  Chuck, are you willing to post a summary of this to r-devel
> for discussion ... and/or post a bug report?

You have to be very specific about what the bug is, if any... I.e., which precisely are the rules that are broken by the current behavior? 

Preferably also suggest a fix --- the corner cases of model.matrix and friends is some of the more impenetrable code in the R sources.

Notice that order dependent parametrization of terms is not a bug per se, nor is the automatic switch to dummy coding of factors. Consider these cases:

d <- cbind(expand.grid(a=c("a","b"),b=c("X","Y"),c=c("U","W")),x=1:8)
model.matrix(~ a:b + a:c, d)
model.matrix(~ a:c + a:b, d)
model.matrix(~ a:b + a:x, d)
model.matrix(~ a:x + a:b, d)

and notice that the logic applying to "x" is the same as that applying to "c". 

The crux seems to be that the model ~a:c contains the model ~a whereas ~a:x does not, and hence the rationale for _not_ expanding a subsequent a:b term to dummies (namely that ~a is "already in" the model) fails. 

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list