R-beta: formula() and model formulae

Bill Venables wvenable at attunga.stats.adelaide.edu.au
Wed May 7 10:54:03 CEST 1997


Peter Dalgaard writes:
 > > 
 > > 2) if x is of mode numeric, then the model formula
 > > mymod <- lm(y ~ x + x^2)
 > > is not processed as S would do it.  The model is fit[ted]
 > > ignoring the x^2 term...
 > 
 > We had that topic a while back.  I think it was concluded that
 > it is a feature, because mixing model formulas and arithmetic
 > ditto is bad practice.

I don't recall we did, but in any case I'd like to re-open it.

There is an anomaly in the way : and ^ terms are handled in the
sense that the logical and useful thing is obvious but does not
happen.  Let me give an example.  Suppose a and b are factors, x
and y are not.

A term such as (a + b + x + y)^2 should be expanded out binomial
fashion, coefficients stripped away and the remaining products
treated as : products.  Then S copes with terms like a:a, a:b and
a:x fine, even x:y is handled by having it generate a column of
xy-products, as it should.

But a term such as x:x does not generate a column of x-squares,
it is merely removed as it would be if it were a factor.  This is
a complete anomaly, and one that I don't think would be hard or
dangerous for R to rectify.  Indeed it would be very useful to
generate a complete second degree regression in three variables
using y ~ (1 + x1 + x2 + x3)^2.  As it is now it generates linear
and product terms only and omits the powers.  Go figure.

 > (I don't have any strong feeling about this, personally.  As
 > long as R won't introduce those awful Helmert contrasts as
 > default...)

Ah, the Helmert contrasts b\^ete noir.  For ANOVA the contrast
matrix used is mostly irrelevant.  For regression models I agree,
treatment contrasts would be generally more easily interpreted.

I presume the reason they were used at all is because if you have
equal replication of everything the Helmert contrasts give you a
model matrix with orthogonal columns, so all estimates are
uncorrelated.  Whenever do you get equal replication, though?

-- 
Bill Venables, Head, Dept of Statistics,    Tel.: +61 8 8303 5418
University of Adelaide,                     Fax.: +61 8 8303 3696
South AUSTRALIA.     5005.   Email: Bill.Venables at adelaide.edu.au
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=



More information about the R-help mailing list