[R] Simple question about formulae in R!?

S Ellison S.Ellison at LGCGroup.com
Fri Aug 10 18:16:14 CEST 2012


> > R in general tries hard to prohibit this behavior (i.e.,  including an 
> > interaction but not the main effect). When removing a main effect and 
> > leaving the interaction, the number of parameters is not reduced by 
> > one (as would be expected) but stays the same, at least 
> > when using model.matrix:

Surely this behaviour is less to do with a dislike of interactions without both main effects (which we will necessarily use if we fit a simple two-factor nested model) than the need to avoid non-uniqueness of a model fitted with too many coefficients? 
In a simple case, an intercept plus n coefficients for n factor levels gives us n+1 coefficients to find, and we only have n independent groups to estimate them from. In model matrix terms we would have one column that is a linear combination of others. For OLS normal equations that generates a zero determinant and for the numerical methods R uses the effect is the same; no useful fit. To avoid that and allow least squares fitting, R sets up the model matrix with only n-1 coefficients in addition to the intercept. As a result we end up with fewer model coefficients than we might have expected (and that annoyingly missing first level that always puzzles newcomers the first time we look at a linear model summary), but we have exactly the number of coefficients that we can estimate uniquely from the groups we have specified.

S

*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}



More information about the R-help mailing list