[R] Simple question about formulae in R!?

Joshua Wiley jwiley.psych at gmail.com
Fri Aug 10 19:02:30 CEST 2012


On Fri, Aug 10, 2012 at 9:16 AM, S Ellison <S.Ellison at lgcgroup.com> wrote:
>> > R in general tries hard to prohibit this behavior (i.e.,  including an
>> > interaction but not the main effect). When removing a main effect and
>> > leaving the interaction, the number of parameters is not reduced by
>> > one (as would be expected) but stays the same, at least
>> > when using model.matrix:
>
> Surely this behaviour is less to do with a dislike of interactions without both main effects (which we will necessarily use if we fit a simple two-factor nested model) than the need to avoid non-uniqueness of a model fitted with too many coefficients?
> In a simple case, an intercept plus n coefficients for n factor levels gives us n+1 coefficients to find, and we only have n independent groups to estimate them from. In model matrix terms we would have one column that is a linear combination of others. For OLS normal equations that generates a zero determinant and for the numerical methods R uses the effect is the same; no useful fit. To avoid that and allow least squares fitting, R sets up the model matrix with only n-1 coefficients in addition to the intercept. As a result we end up with fewer model coefficients than we might have expected (and that annoyingly missing first level that always puzzles newcomers the first time we look at a linear model summary), but we have exactly the number of coefficients that we can estimate uniquely from the groups we have specified.

N.B. Off topic.

This is an incredibly nice feature of R.  SAS overparameterizes the
design matrix and employs the sweep algorithm to zero out redundant
parameters.  With one result being if you want to specify your own
data to post multiply by the coefficient vector, you need to realize
that the vector is larger than the number of nonmissing parameters.  I
struggle to imagine this is more computationally efficient than simply
creating an appropriately parameterized design matrix, although I
suppose in either case you need to check for less than full rank
design matrices.

>
> S
>
> *******************************************************************
> This email and any attachments are confidential. Any u...{{dropped:17}}



More information about the R-help mailing list