[R] lm/model.matrix confusion (? bug)
gavin.simpson at ucl.ac.uk
Wed Dec 12 11:37:03 CET 2007
On Wed, 2007-12-12 at 02:05 -0800, Mark Difford wrote:
> Dear List-members,
> Hopefully someone will help through my confusion:
> In order to get the same coefficients as we get from the following
> require (MASS)
> summary ( lm(Gas ~ Insul/Temp - 1, data = whiteside) )
> we need to do the following (if we use model.matrix to specify the model)
> summary ( lm(Gas ~ model.matrix(~ Insul/Temp - 1) - 1, data = whiteside) )
> That is, we need to take out "two intercepts." Is this "correct"?
Yes - if you insist on doing things that way
> Shouldn't lm check to see if an intercept has been requested as part of the
> model formula?
It does, (i.e. whether the user specified -1 in the formula argument),
but you specified it inside model.matrix() so the formula parsing code
never sees it. One wouldn't want lm to need to know about all functions
that have/could ever be written in R, past or future, to know how to
divine that here you wanted "-1" to mean "remove intercept please Mr. R
Parser" and not "subtract 1 from this result please Mr. R Parser".
You are really abusing the reason for having the formula interface. The
whole point of it is to make it easy for user to specify a model, from
which R generates the model matrix for you. Why use the formula at all
if you have already produced your model matrix?
See ?lm.fit for a fast way of fitting linear models (used within lm() )
if you have all the bits in place (i.e. you already have a model matrix)
and are happy to take care of other details yourself.
> (Sorry, this is getting really long) --- So, my question/confusion comes
> down to wanting to know why lm() doesn't check to see if an intercept has
> been specified when the model has been specified using model.matrix.
I guess because you can't programme around every possible usage that a
user might dream up to try. If it did, lm would be an absolute pig and
run like one. It is slow enough as it is with all the user-friendly
stuff in there to help. (By slow enough, I mean it is perfectly speedy
in routine use, but you wouldn't want to use it in a permutation
test-like environment if you were after speed - you'd be better off
working with lm.fit directly)
The model.matrix bit is returning a matrix (without an entry for an
intercept), which is fine on the rhs of a formula. As all you've done
here is (effectively) create the following model:
> form <- formula(paste("Gas ~",
+ paste(colnames(model.matrix (~ Insul/Temp-1,
+ data=whiteside)), collapse = " + ")))
Gas ~ InsulBefore + InsulAfter + InsulBefore:Temp + InsulAfter:Temp
I.e., that is what your convoluted formula is being interpreted as, you
then still need to remove the intercept that R will automagically add to
the model when it subsequently creates the model matrix.
Dr. Gavin Simpson [t] +44 (0)20 7679 0522
ECRC, UCL Geography, [f] +44 (0)20 7679 0565
Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
More information about the R-help