[R] lm/model.matrix confusion (? bug)

Gavin Simpson gavin.simpson at ucl.ac.uk
Wed Dec 12 11:37:03 CET 2007

On Wed, 2007-12-12 at 02:05 -0800, Mark Difford wrote:
> Dear List-members,
> Hopefully someone will help through my confusion:
> In order to get the same coefficients as we get from the following
> ##
> require (MASS)
> summary ( lm(Gas ~ Insul/Temp - 1, data = whiteside) )
> ......................
> we need to do the following (if we use model.matrix to specify the model)
> ##
> summary ( lm(Gas ~ model.matrix(~ Insul/Temp - 1) - 1, data = whiteside) )
> ......................
> That is, we need to take out "two intercepts."  Is this "correct"?

Yes - if you insist on doing things that way

> Shouldn't lm check to see if an intercept has been requested as part of the
> model formula?

It does, (i.e. whether the user specified -1 in the formula argument),
but you specified it inside model.matrix() so the formula parsing code
never sees it. One wouldn't want lm to need to know about all functions
that have/could ever be written in R, past or future, to know how to
divine that here you wanted "-1" to mean "remove intercept please Mr. R
Parser" and not "subtract 1 from this result please Mr. R Parser".

You are really abusing the reason for having the formula interface. The
whole point of it is to make it easy for user to specify a model, from
which R generates the model matrix for you. Why use the formula at all
if you have already produced your model matrix?

See ?lm.fit for a fast way of fitting linear models (used within lm() )
if you have all the bits in place (i.e. you already have a model matrix)
and are happy to take care of other details yourself.

<snip />
> (Sorry, this is getting really long) --- So, my question/confusion comes
> down to wanting to know why lm() doesn't check to see if an intercept has
> been specified when the model has been specified using model.matrix.

I guess because you can't programme around every possible usage that a
user might dream up to try. If it did, lm would be an absolute pig and
run like one. It is slow enough as it is with all the user-friendly
stuff in there to help. (By slow enough, I mean it is perfectly speedy
in routine use, but you wouldn't want to use it in a permutation
test-like environment if you were after speed - you'd be better off
working with lm.fit directly)

The model.matrix bit is returning a matrix (without an entry for an
intercept), which is fine on the rhs of a formula. As all you've done
here is (effectively) create the following model:

> form <- formula(paste("Gas ~", 
+                 paste(colnames(model.matrix (~ Insul/Temp-1,
+                                data=whiteside)), collapse = " + ")))
> form
Gas ~ InsulBefore + InsulAfter + InsulBefore:Temp + InsulAfter:Temp

I.e., that is what your convoluted formula is being interpreted as, you
then still need to remove the intercept that R will automagically add to
the model when it subsequently creates the model matrix.



> Regards,
> Mark.
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk

More information about the R-help mailing list