[Rd] y ~ X -1 , X a matrix

Peter Dalgaard pdalgd at gmail.com
Thu Mar 18 08:17:18 CET 2010

Ross Boylan wrote:
> On Thu, 2010-03-18 at 00:57 +0000, Ted.Harding at manchester.ac.uk wrote:
>> On 17-Mar-10 23:32:41, Ross Boylan wrote:
>>> While browsing some code I discovered a call to lm that used
>>> a formula y ~ X - 1, where X was a matrix.
>>> Looking through the documentation of formula, lm, model.matrix
>>> and maybe some others I couldn't find this useage (R 2.10.1).
>>> Is it anything I can count on in future versions?  Is there
>>> documentation I've overlooked?
>>> For the curious: model.frame on the above equation returns a
>>> data.frame with 2 columns.  The second "column" is the whole X
>>> matrix. model.matrix on that object returns the expected matrix,
>>> with the transition from the odd model.frame to the regular
>>> matrix happening in an .Internal call.
>>> Thanks.
>>> Ross
>>> P.S. I would appreciate cc's, since mail problems are preventing
>>> me from seeing list mail.
>> Hmmm ... I'm not sure what is the problem with what you describe.
> There is no problem in the "it doesn't work" sense.
> There is a problem that it seems undocumented--though the help you quote
> could rather indirectly be taken as a clue--and thus, possibly, subject
> to change in later releases.

I'm pretty sure that it is per original design that data frames can have
matrix columns, although data.frame() and as.data.frame() are quite
trigger-happy when it comes to converting them to individual columns.
You need things like d <- data.frame(X=I(X)) to prevent it.

As you have seen, matrices can be handy on the RHS of formulas, but
there are at least two cases where they are crucial on the LHS,
multivariate linear models and one version of glm(Y~..., binomial).

Without being able to store matrices as individual components in a data
frame, I don't think you can avoid internally expanding model formula
into (say) Y ~ X1 + X2 - 1, which could get rather unwieldy, so I don't
think the feature will be going away. (Someone with too much time on
his/her hand might want to rationalize the whole data frame concept, but
that should go in the direction of handling all  matrix-like structures
consistently, including date-time objects etc.)

Peter Dalgaard
Center for Statistics, Copenhagen Business School
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

More information about the R-devel mailing list