[Rd] Bug in model.matrix.default for higher-order interaction encoding when specific model terms are missing

Tue Nov 7 06:41:24 CET 2017

Hello Tyler,

model.matrix(~(X1+X2+X3)^3-X1:X3)

T_i = X1:X2:X3. Let F_j = X3. (The numerical variables X1 and X2 are
not encoded at all. Then, again, T_{i(j)} = X1:X2, which in this
example is NOT dropped from the model. Hence the X3 in T_i must be
encoded by contrast, as indeed it is.

  Arie

On Mon, Nov 6, 2017 at 5:09 PM, Tyler <tylermw at gmail.com> wrote:
> Hi Arie,
>
> Given the heuristic, in all of my examples with a missing two-factor
> interaction the three-factor interaction should be coded with dummy
> variables. In reality, it is encoded by dummy variables only when the
> numeric:numeric interaction is missing, and by contrasts for the other two.
> The heuristic does not specify separate behavior for numeric vs categorical
> factors (When the author of Statistical Models in S refers to F_j as a
> "factor", it is a more general usage than the R type "factor" and includes
> numeric variables--the language used later on in the chapter on page 40
> confirms this): when there is a missing marginal term in the formula, the
> higher-order interaction should be coded by dummy variables, regardless of
> type. Thus, the terms() function is only following the cited behavior 1/3rd
> of the time.
>
> Best regards,
> Tyler
>
> On Mon, Nov 6, 2017 at 6:45 AM, Arie ten Cate <arietencate at gmail.com> wrote:
>>
>> Hello Tyler,
>>
>> You write that you understand what I am saying. However, I am now at
>> loss about what exactly is the problem with the behavior of R.  Here
>> is a script which reproduces your experiments with three variables
>> (excluding the full model):
>>
>> m=expand.grid(X1=c(1,-1),X2=c(1,-1),X3=c("A","B","C"))
>> model.matrix(~(X1+X2+X3)^3-X1:X3,data=m)
>> model.matrix(~(X1+X2+X3)^3-X2:X3,data=m)
>> model.matrix(~(X1+X2+X3)^3-X1:X2,data=m)
>>
>> Below are the three results, similar to your first mail. (The first
>> two are basically the same, of course.) Please pick one result which
>> you think is not consistent with the heuristic and please give what
>> you think is the correct result:
>>
>> model.matrix(~(X1+X2+X3)^3-X1:X3)
>>   (Intercept)
>>   X1 X2 X3B X3C
>>   X1:X2 X2:X3B X2:X3C
>>   X1:X2:X3B X1:X2:X3C
>>
>> model.matrix(~(X1+X2+X3)^3-X2:X3)
>>   (Intercept)
>>   X1 X2 X3B X3C
>>   X1:X2 X1:X3B X1:X3C
>>   X1:X2:X3B X1:X2:X3C
>>
>> model.matrix(~(X1+X2+X3)^3-X1:X2)
>>   (Intercept)
>>   X1 X2 X3B X3C
>>   X1:X3B X1:X3C X2:X3B X2:X3C
>>   X1:X2:X3A X1:X2:X3B X1:X2:X3C
>>
>> (I take it that the combination of X3A and X3B and X3C implies dummy
>> encoding, and the combination of only X3B and X3C implies contrasts
>> encoding, with respect to X3A.)
>>
>> Thanks in advance,
>>
>> Arie
>>
>>
>> On Sat, Nov 4, 2017 at 5:33 PM, Tyler <tylermw at gmail.com> wrote:
>> > Hi Arie,
>> >
>> > I understand what you're saying. The following excerpt out of the book
>> > shows
>> > that F_j does not refer exclusively to categorical factors: "...the rule
>> > does not do anything special for them, and it remains valid, in a
>> > trivial
>> > sense, whenever any of the F_j is numeric rather than categorical."
>> > Since
>> > F_j refers to both categorical and numeric variables, the behavior of
>> > model.matrix is not consistent with the heuristic.
>> >
>> > Best regards,
>> > Tyler
>> >
>> > On Sat, Nov 4, 2017 at 6:50 AM, Arie ten Cate <arietencate at gmail.com>
>> > wrote:
>> >>
>> >> Hello Tyler,
>> >>
>> >> I rephrase my previous mail, as follows:
>> >>
>> >> In your example, T_i = X1:X2:X3. Let F_j = X3. (The numerical
>> >> variables X1 and X2 are not encoded at all.) Then T_{i(j)} = X1:X2,
>> >> which in the example is dropped from the model. Hence the X3 in T_i
>> >> must be encoded by dummy variables, as indeed it is.
>> >>
>> >>   Arie
>> >>
>> >>
>> >> On Thu, Nov 2, 2017 at 4:11 PM, Tyler <tylermw at gmail.com> wrote:
>> >> > Hi Arie,
>> >> >
>> >> > The book out of which this behavior is based does not use factor (in
>> >> > this
>> >> > section) to refer to categorical factor. I will again point to this
>> >> > sentence, from page 40, in the same section and referring to the
>> >> > behavior
>> >> > under question, that shows F_j is not limited to categorical factors:
>> >> > "Numeric variables appear in the computations as themselves, uncoded.
>> >> > Therefore, the rule does not do anything special for them, and it
>> >> > remains
>> >> > valid, in a trivial sense, whenever any of the F_j is numeric rather
>> >> > than
>> >> > categorical."
>> >> >
>> >> > Note the "... whenever any of the F_j is numeric rather than
>> >> > categorical."
>> >> > Factor here is used in the more general sense of the word, not
>> >> > referring
>> >> > to
>> >> > the R type "factor." The behavior of R does not match the heuristic
>> >> > that
>> >> > it's citing.
>> >> >
>> >> > Best regards,
>> >> > Tyler
>> >> >
>> >> > On Thu, Nov 2, 2017 at 2:51 AM, Arie ten Cate <arietencate at gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> Hello Tyler,
>> >> >>
>> >> >> Thank you for searching for, and finding, the basic description of
>> >> >> the
>> >> >> behavior of R in this matter.
>> >> >>
>> >> >> I think your example is in agreement with the book.
>> >> >>
>> >> >> But let me first note the following. You write: "F_j refers to a
>> >> >> factor (variable) in a model and not a categorical factor". However:
>> >> >> "a factor is a vector object used to specify a discrete
>> >> >> classification" (start of chapter 4 of "An Introduction to R".) You
>> >> >> might also see the description of the R function factor().
>> >> >>
>> >> >> You note that the book says about a factor F_j:
>> >> >>   "... F_j is coded by contrasts if T_{i(j)} has appeared in the
>> >> >> formula and by dummy variables if it has not"
>> >> >>
>> >> >> You find:
>> >> >>    "However, the example I gave demonstrated that this dummy
>> >> >> variable
>> >> >> encoding only occurs for the model where the missing term is the
>> >> >> numeric-numeric interaction, ~(X1+X2+X3)^3-X1:X2."
>> >> >>
>> >> >> We have here T_i = X1:X2:X3. Also: F_j = X3 (the only factor). Then
>> >> >> T_{i(j)} = X1:X2, which is dropped from the model. Hence the X3 in
>> >> >> T_i
>> >> >> must be encoded by dummy variables, as indeed it is.
>> >> >>
>> >> >>   Arie
>> >> >>
>> >> >> On Tue, Oct 31, 2017 at 4:01 PM, Tyler <tylermw at gmail.com> wrote:
>> >> >> > Hi Arie,
>> >> >> >
>> >> >> > Thank you for your further research into the issue.
>> >> >> >
>> >> >> > Regarding Stata: On the other hand, JMP gives model matrices that
>> >> >> > use
>> >> >> > the
>> >> >> > main effects contrasts in computing the higher order interactions,
>> >> >> > without
>> >> >> > the dummy variable encoding. I verified this both by analyzing the
>> >> >> > linear
>> >> >> > model given in my first example and noting that JMP has one more
>> >> >> > degree
>> >> >> > of
>> >> >> > freedom than R for the same model, as well as looking at the
>> >> >> > generated
>> >> >> > model
>> >> >> > matrices. It's easy to find a design where JMP will allow us fit
>> >> >> > our
>> >> >> > model
>> >> >> > with goodness-of-fit estimates and R will not due to the extra
>> >> >> > degree(s)
>> >> >> > of
>> >> >> > freedom required. Let's keep the conversation limited to R.
>> >> >> >
>> >> >> > I want to refocus back onto my original bug report, which was not
>> >> >> > for
>> >> >> > a
>> >> >> > missing main effects term, but rather for a missing lower-order
>> >> >> > interaction
>> >> >> > term. The behavior of model.matrix.default() for a missing main
>> >> >> > effects
>> >> >> > term
>> >> >> > is a nice example to demonstrate how model.matrix encodes with
>> >> >> > dummy
>> >> >> > variables instead of contrasts, but doesn't demonstrate the
>> >> >> > inconsistent
>> >> >> > behavior my bug report highlighted.
>> >> >> >
>> >> >> > I went looking for documentation on this behavior, and the issue
>> >> >> > stems
>> >> >> > not
>> >> >> > from model.matrix.default(), but rather the terms() function in
>> >> >> > interpreting
>> >> >> > the formula. This "clever" replacement of contrasts by dummy
>> >> >> > variables
>> >> >> > to
>> >> >> > maintain marginality (presuming that's the reason) is not
>> >> >> > described
>> >> >> > anywhere
>> >> >> > in the documentation for either the model.matrix() or the terms()
>> >> >> > function.
>> >> >> > In order to find a description for the behavior, I had to look in
>> >> >> > the
>> >> >> > underlying C code, buried above the "TermCode" function of the
>> >> >> > "model.c"
>> >> >> > file, which says:
>> >> >> >
>> >> >> > "TermCode decides on the encoding of a model term. Returns 1 if
>> >> >> > variable
>> >> >> > ``whichBit'' in ``thisTerm'' is to be encoded by contrasts and 2
>> >> >> > if
>> >> >> > it
>> >> >> > is to
>> >> >> > be encoded by dummy variables.  This is decided using the
>> >> >> > heuristic
>> >> >> > described in Statistical Models in S, page 38."
>> >> >> >
>> >> >> > I do not have a copy of this book, and I suspect most R users do
>> >> >> > not
>> >> >> > as
>> >> >> > well. Thankfully, however, some of the pages describing this
>> >> >> > behavior
>> >> >> > were
>> >> >> > available as part of Amazon's "Look Inside" feature--but if not
>> >> >> > for
>> >> >> > that, I
>> >> >> > would have no idea what heuristic R was using. Since those pages
>> >> >> > could
>> >> >> > made
>> >> >> > unavailable by Amazon at any time, at the very least we have an
>> >> >> > problem
>> >> >> > with
>> >> >> > a lack of documentation.
>> >> >> >
>> >> >> > However, I still believe there is a bug when comparing R's
>> >> >> > implementation to
>> >> >> > the heuristic described in the book. From Statistical Models in S,
>> >> >> > page
>> >> >> > 38-39:
>> >> >> >
>> >> >> > "Suppose F_j is any factor included in term T_i. Let T_{i(j)}
>> >> >> > denote
>> >> >> > the
>> >> >> > margin of T_i for factor F_j--that is, the term obtained by
>> >> >> > dropping
>> >> >> > F_j
>> >> >> > from T_i. We say that T_{i(j)} has appeared in the formula if
>> >> >> > there
>> >> >> > is
>> >> >> > some
>> >> >> > term T_i' for i' < i such that T_i' contains all the factors
>> >> >> > appearing
>> >> >> > in
>> >> >> > T_{i(j)}. The usual case is that T_{i(j)} itself is one of the
>> >> >> > preceding
>> >> >> > terms. Then F_j is coded by contrasts if T_{i(j)} has appeared in
>> >> >> > the
>> >> >> > formula and by dummy variables if it has not"
>> >> >> >
>> >> >> > Here, F_j refers to a factor (variable) in a model and not a
>> >> >> > categorical
>> >> >> > factor, as specified later in that section (page 40): "Numeric
>> >> >> > variables
>> >> >> > appear in the computations as themselves, uncoded. Therefore, the
>> >> >> > rule
>> >> >> > does
>> >> >> > not do anything special for them, and it remains valid, in a
>> >> >> > trivial
>> >> >> > sense,
>> >> >> > whenever any of the F_j is numeric rather than categorical."
>> >> >> >
>> >> >> > Going back to my original example with three variables: X1
>> >> >> > (numeric),
>> >> >> > X2
>> >> >> > (numeric), X3 (categorical). This heuristic prescribes encoding
>> >> >> > X1:X2:X3
>> >> >> > with contrasts as long as X1:X2, X1:X3, and X2:X3 exist in the
>> >> >> > formula.
>> >> >> > When
>> >> >> > any of the preceding terms do not exist, this heuristic tells us
>> >> >> > to
>> >> >> > use
>> >> >> > dummy variables to encode the interaction (e.g. "F_j [the
>> >> >> > interaction
>> >> >> > term]
>> >> >> > is coded ... by dummy variables if it [any of the marginal terms
>> >> >> > obtained by
>> >> >> > dropping a single factor in the interaction] has not [appeared in
>> >> >> > the
>> >> >> > formula]"). However, the example I gave demonstrated that this
>> >> >> > dummy
>> >> >> > variable encoding only occurs for the model where the missing term
>> >> >> > is
>> >> >> > the
>> >> >> > numeric-numeric interaction, "~(X1+X2+X3)^3-X1:X2". Otherwise, the
>> >> >> > interaction term X1:X2:X3 is encoded by contrasts, not dummy
>> >> >> > variables.
>> >> >> > This
>> >> >> > is inconsistent with the description of the intended behavior
>> >> >> > given
>> >> >> > in
>> >> >> > the
>> >> >> > book.
>> >> >> >
>> >> >> > Best regards,
>> >> >> > Tyler
>> >> >> >
>> >> >> >
>> >> >> > On Fri, Oct 27, 2017 at 2:18 PM, Arie ten Cate
>> >> >> > <arietencate at gmail.com>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> Hello Tyler,
>> >> >> >>
>> >> >> >> I want to bring to your attention the following document: "What
>> >> >> >> happens if you omit the main effect in a regression model with an
>> >> >> >> interaction?"
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> (https://stats.idre.ucla.edu/stata/faq/what-happens-if-you-omit-the-main-effect-in-a-regression-model-with-an-interaction).
>> >> >> >> This gives a useful review of the problem. Your example is Case
>> >> >> >> 2: a
>> >> >> >> continuous and a categorical regressor.
>> >> >> >>
>> >> >> >> The numerical examples are coded in Stata, and they give the same
>> >> >> >> result as in R. Hence, if this is a bug in R then it is also a
>> >> >> >> bug
>> >> >> >> in
>> >> >> >> Stata. That seems very unlikely.
>> >> >> >>
>> >> >> >> Here is a simulation in R of the above mentioned Case 2 in Stata:
>> >> >> >>
>> >> >> >> df <- expand.grid(socst=c(-1:1),grp=c("1","2","3","4"))
>> >> >> >> print("Full model")
>> >> >> >> print(model.matrix(~(socst+grp)^2 ,data=df))
>> >> >> >> print("Example 2.1: drop socst")
>> >> >> >> print(model.matrix(~(socst+grp)^2 -socst ,data=df))
>> >> >> >> print("Example 2.2: drop grp")
>> >> >> >> print(model.matrix(~(socst+grp)^2 -grp ,data=df))
>> >> >> >>
>> >> >> >> This gives indeed the following regressors:
>> >> >> >>
>> >> >> >> "Full model"
>> >> >> >> (Intercept) socst grp2 grp3 grp4 socst:grp2 socst:grp3 socst:grp4
>> >> >> >> "Example 2.1: drop socst"
>> >> >> >> (Intercept) grp2 grp3 grp4 socst:grp1 socst:grp2 socst:grp3
>> >> >> >> socst:grp4
>> >> >> >> "Example 2.2: drop grp"
>> >> >> >> (Intercept) socst socst:grp2 socst:grp3 socst:grp4
>> >> >> >>
>> >> >> >> There is a little bit of R documentation about this, based on the
>> >> >> >> concept of marginality, which typically forbids a model having an
>> >> >> >> interaction but not the corresponding main effects. (You might
>> >> >> >> see
>> >> >> >> the
>> >> >> >> references in
>> >> >> >> https://en.wikipedia.org/wiki/Principle_of_marginality
>> >> >> >> )
>> >> >> >>     See "An Introduction to R", by Venables and Smith and the R
>> >> >> >> Core
>> >> >> >> Team. At the bottom of page 52 (PDF: 57) it says: "Although the
>> >> >> >> details are complicated, model formulae in R will normally
>> >> >> >> generate
>> >> >> >> the models that an expert statistician would expect, provided
>> >> >> >> that
>> >> >> >> marginality is preserved. Fitting, for [a contrary] example, a
>> >> >> >> model
>> >> >> >> with an interaction but not the corresponding main effects will
>> >> >> >> in
>> >> >> >> general lead to surprising results ....".
>> >> >> >>     The Reference Manual states that the R functions dropterm()
>> >> >> >> and
>> >> >> >> addterm() resp. drop or add only terms such that marginality is
>> >> >> >> preserved.
>> >> >> >>
>> >> >> >> Finally, about your singular matrix t(mm)%*%mm. This is in fact
>> >> >> >> Example 2.1 in Case 2 discussed above. As discussed there, in
>> >> >> >> Stata
>> >> >> >> and in R the drop of the continuous variable has no effect on the
>> >> >> >> degrees of freedom here: it is just a reparameterisation of the
>> >> >> >> full
>> >> >> >> model, protecting you against losing marginality... Hence the
>> >> >> >> model.matrix 'mm' is still square and nonsingular after the drop
>> >> >> >> of
>> >> >> >> X1, unless of course when a row is removed from the matrix
>> >> >> >> 'design'
>> >> >> >> when before creating 'mm'.
>> >> >> >>
>> >> >> >>     Arie
>> >> >> >>
>> >> >> >> On Sun, Oct 15, 2017 at 7:05 PM, Tyler <tylermw at gmail.com> wrote:
>> >> >> >> > You could possibly try to explain away the behavior for a
>> >> >> >> > missing
>> >> >> >> > main
>> >> >> >> > effects term, since without the main effects term we don't have
>> >> >> >> > main
>> >> >> >> > effect
>> >> >> >> > columns in the model matrix used to compute the interaction
>> >> >> >> > columns
>> >> >> >> > (At
>> >> >> >> > best this is undocumented behavior--I still think it's a bug,
>> >> >> >> > as
>> >> >> >> > we
>> >> >> >> > know
>> >> >> >> > how we would encode the categorical factors if they were in
>> >> >> >> > fact
>> >> >> >> > present.
>> >> >> >> > It's either specified in contrasts.arg or using the default set
>> >> >> >> > in
>> >> >> >> > options). However, when all the main effects are present, why
>> >> >> >> > would
>> >> >> >> > the
>> >> >> >> > three-factor interaction column not simply be the product of
>> >> >> >> > the
>> >> >> >> > main
>> >> >> >> > effect columns? In my example: we know X1, we know X2, and we
>> >> >> >> > know
>> >> >> >> > X3.
>> >> >> >> > Why
>> >> >> >> > does the encoding of X1:X2:X3 depend on whether we specified a
>> >> >> >> > two-factor
>> >> >> >> > interaction, AND only changes for specific missing
>> >> >> >> > interactions?
>> >> >> >> >
>> >> >> >> > In addition, I can use a two-term example similar to yours to
>> >> >> >> > show
>> >> >> >> > how
>> >> >> >> > this
>> >> >> >> > behavior results in a singular covariance matrix when, given
>> >> >> >> > the
>> >> >> >> > desired
>> >> >> >> > factor encoding, it should not be singular.
>> >> >> >> >
>> >> >> >> > We start with a full factorial design for a two-level
>> >> >> >> > continuous
>> >> >> >> > factor
>> >> >> >> > and
>> >> >> >> > a three-level categorical factor, and remove a single row. This
>> >> >> >> > design
>> >> >> >> > matrix does not leave enough degrees of freedom to determine
>> >> >> >> > goodness-of-fit, but should allow us to obtain parameter
>> >> >> >> > estimates.
>> >> >> >> >
>> >> >> >> >> design = expand.grid(X1=c(1,-1),X2=c("A","B","C"))
>> >> >> >> >> design = design[-1,]
>> >> >> >> >> design
>> >> >> >> >   X1 X2
>> >> >> >> > 2 -1  A
>> >> >> >> > 3  1  B
>> >> >> >> > 4 -1  B
>> >> >> >> > 5  1  C
>> >> >> >> > 6 -1  C
>> >> >> >> >
>> >> >> >> > Here, we first calculate the model matrix for the full model,
>> >> >> >> > and
>> >> >> >> > then
>> >> >> >> > manually remove the X1 column from the model matrix. This gives
>> >> >> >> > us
>> >> >> >> > the
>> >> >> >> > model matrix one would expect if X1 were removed from the
>> >> >> >> > model.
>> >> >> >> > We
>> >> >> >> > then
>> >> >> >> > successfully calculate the covariance matrix.
>> >> >> >> >
>> >> >> >> >> mm = model.matrix(~(X1+X2)^2,data=design)
>> >> >> >> >> mm
>> >> >> >> >   (Intercept) X1 X2B X2C X1:X2B X1:X2C
>> >> >> >> > 2           1 -1   0   0      0      0
>> >> >> >> > 3           1  1   1   0      1      0
>> >> >> >> > 4           1 -1   1   0     -1      0
>> >> >> >> > 5           1  1   0   1      0      1
>> >> >> >> > 6           1 -1   0   1      0     -1
>> >> >> >> >
>> >> >> >> >> mm = mm[,-2]
>> >> >> >> >> solve(t(mm) %*% mm)
>> >> >> >> >             (Intercept)  X2B  X2C X1:X2B X1:X2C
>> >> >> >> > (Intercept)           1 -1.0 -1.0    0.0    0.0
>> >> >> >> > X2B                  -1  1.5  1.0    0.0    0.0
>> >> >> >> > X2C                  -1  1.0  1.5    0.0    0.0
>> >> >> >> > X1:X2B                0  0.0  0.0    0.5    0.0
>> >> >> >> > X1:X2C                0  0.0  0.0    0.0    0.5
>> >> >> >> >
>> >> >> >> > Here, we see the actual behavior for model.matrix. The
>> >> >> >> > undesired
>> >> >> >> > re-coding
>> >> >> >> > of the model matrix interaction term makes the information
>> >> >> >> > matrix
>> >> >> >> > singular.
>> >> >> >> >
>> >> >> >> >> mm = model.matrix(~(X1+X2)^2-X1,data=design)
>> >> >> >> >> mm
>> >> >> >> >   (Intercept) X2B X2C X1:X2A X1:X2B X1:X2C
>> >> >> >> > 2           1   0   0     -1      0      0
>> >> >> >> > 3           1   1   0      0      1      0
>> >> >> >> > 4           1   1   0      0     -1      0
>> >> >> >> > 5           1   0   1      0      0      1
>> >> >> >> > 6           1   0   1      0      0     -1
>> >> >> >> >
>> >> >> >> >> solve(t(mm) %*% mm)
>> >> >> >> > Error in solve.default(t(mm) %*% mm) : system is
>> >> >> >> > computationally
>> >> >> >> > singular:
>> >> >> >> > reciprocal condition number = 5.55112e-18
>> >> >> >> >
>> >> >> >> > I still believe this is a bug.
>> >> >> >> >
>> >> >> >> > Best regards,
>> >> >> >> > Tyler Morgan-Wall
>> >> >> >> >
>> >> >> >> > On Sun, Oct 15, 2017 at 1:49 AM, Arie ten Cate
>> >> >> >> > <arietencate at gmail.com>
>> >> >> >> > wrote:
>> >> >> >> >
>> >> >> >> >> I think it is not a bug. It is a general property of
>> >> >> >> >> interactions.
>> >> >> >> >> This property is best observed if all variables are factors
>> >> >> >> >> (qualitative).
>> >> >> >> >>
>> >> >> >> >> For example, you have three variables (factors). You ask for
>> >> >> >> >> as
>> >> >> >> >> many
>> >> >> >> >> interactions as possible, except an interaction term between
>> >> >> >> >> two
>> >> >> >> >> particular variables. When this interaction is not a constant,
>> >> >> >> >> it
>> >> >> >> >> is
>> >> >> >> >> different for different values of the remaining variable. More
>> >> >> >> >> precisely: for all values of that variable. In other words:
>> >> >> >> >> you
>> >> >> >> >> have
>> >> >> >> >> a
>> >> >> >> >> three-way interaction, with all values of that variable.
>> >> >> >> >>
>> >> >> >> >> An even smaller example is the following script with only two
>> >> >> >> >> variables, each being a factor:
>> >> >> >> >>
>> >> >> >> >>  df <- expand.grid(X1=c("p","q"), X2=c("A","B","C"))
>> >> >> >> >>  print(model.matrix(~(X1+X2)^2    ,data=df))
>> >> >> >> >>  print(model.matrix(~(X1+X2)^2 -X1,data=df))
>> >> >> >> >>  print(model.matrix(~(X1+X2)^2 -X2,data=df))
>> >> >> >> >>
>> >> >> >> >> The result is:
>> >> >> >> >>
>> >> >> >> >>   (Intercept) X1q X2B X2C X1q:X2B X1q:X2C
>> >> >> >> >> 1           1   0   0   0       0       0
>> >> >> >> >> 2           1   1   0   0       0       0
>> >> >> >> >> 3           1   0   1   0       0       0
>> >> >> >> >> 4           1   1   1   0       1       0
>> >> >> >> >> 5           1   0   0   1       0       0
>> >> >> >> >> 6           1   1   0   1       0       1
>> >> >> >> >>
>> >> >> >> >>   (Intercept) X2B X2C X1q:X2A X1q:X2B X1q:X2C
>> >> >> >> >> 1           1   0   0       0       0       0
>> >> >> >> >> 2           1   0   0       1       0       0
>> >> >> >> >> 3           1   1   0       0       0       0
>> >> >> >> >> 4           1   1   0       0       1       0
>> >> >> >> >> 5           1   0   1       0       0       0
>> >> >> >> >> 6           1   0   1       0       0       1
>> >> >> >> >>
>> >> >> >> >>   (Intercept) X1q X1p:X2B X1q:X2B X1p:X2C X1q:X2C
>> >> >> >> >> 1           1   0       0       0       0       0
>> >> >> >> >> 2           1   1       0       0       0       0
>> >> >> >> >> 3           1   0       1       0       0       0
>> >> >> >> >> 4           1   1       0       1       0       0
>> >> >> >> >> 5           1   0       0       0       1       0
>> >> >> >> >> 6           1   1       0       0       0       1
>> >> >> >> >>
>> >> >> >> >> Thus, in the second result, we have no main effect of X1.
>> >> >> >> >> Instead,
>> >> >> >> >> the
>> >> >> >> >> effect of X1 depends on the value of X2; either A or B or C.
>> >> >> >> >> In
>> >> >> >> >> fact,
>> >> >> >> >> this is a two-way interaction, including all three values of
>> >> >> >> >> X2.
>> >> >> >> >> In
>> >> >> >> >> the third result, we have no main effect of X2, The effect of
>> >> >> >> >> X2
>> >> >> >> >> depends on the value of X1; either p or q.
>> >> >> >> >>
>> >> >> >> >> A complicating element with your example seems to be that your
>> >> >> >> >> X1
>> >> >> >> >> and
>> >> >> >> >> X2 are not factors.
>> >> >> >> >>
>> >> >> >> >>    Arie
>> >> >> >> >>
>> >> >> >> >> On Thu, Oct 12, 2017 at 7:12 PM, Tyler <tylermw at gmail.com>
>> >> >> >> >> wrote:
>> >> >> >> >> > Hi,
>> >> >> >> >> >
>> >> >> >> >> > I recently ran into an inconsistency in the way
>> >> >> >> >> > model.matrix.default
>> >> >> >> >> > handles factor encoding for higher level interactions with
>> >> >> >> >> > categorical
>> >> >> >> >> > variables when the full hierarchy of effects is not present.
>> >> >> >> >> > Depending on
>> >> >> >> >> > which lower level interactions are specified, the factor
>> >> >> >> >> > encoding
>> >> >> >> >> > changes
>> >> >> >> >> > for a higher level interaction. Consider the following
>> >> >> >> >> > minimal
>> >> >> >> >> reproducible
>> >> >> >> >> > example:
>> >> >> >> >> >
>> >> >> >> >> > --------------
>> >> >> >> >> >
>> >> >> >> >> >> runmatrix =
>> >> >> >> >> >> expand.grid(X1=c(1,-1),X2=c(1,-1),X3=c("A","B","C"))>
>> >> >> >> >> model.matrix(~(X1+X2+X3)^3,data=runmatrix)   (Intercept) X1 X2
>> >> >> >> >> X3B
>> >> >> >> >> X3C
>> >> >> >> >> X1:X2 X1:X3B X1:X3C X2:X3B X2:X3C X1:X2:X3B X1:X2:X3C
>> >> >> >> >> > 1            1  1  1   0   0     1      0      0      0
>> >> >> >> >> > 0
>> >> >> >> >> > 0         0
>> >> >> >> >> > 2            1 -1  1   0   0    -1      0      0      0
>> >> >> >> >> > 0
>> >> >> >> >> > 0         0
>> >> >> >> >> > 3            1  1 -1   0   0    -1      0      0      0
>> >> >> >> >> > 0
>> >> >> >> >> > 0         0
>> >> >> >> >> > 4            1 -1 -1   0   0     1      0      0      0
>> >> >> >> >> > 0
>> >> >> >> >> > 0         0
>> >> >> >> >> > 5            1  1  1   1   0     1      1      0      1
>> >> >> >> >> > 0
>> >> >> >> >> > 1         0
>> >> >> >> >> > 6            1 -1  1   1   0    -1     -1      0      1
>> >> >> >> >> > 0
>> >> >> >> >> > -1         0
>> >> >> >> >> > 7            1  1 -1   1   0    -1      1      0     -1
>> >> >> >> >> > 0
>> >> >> >> >> > -1         0
>> >> >> >> >> > 8            1 -1 -1   1   0     1     -1      0     -1
>> >> >> >> >> > 0
>> >> >> >> >> > 1         0
>> >> >> >> >> > 9            1  1  1   0   1     1      0      1      0
>> >> >> >> >> > 1
>> >> >> >> >> > 0         1
>> >> >> >> >> > 10           1 -1  1   0   1    -1      0     -1      0
>> >> >> >> >> > 1
>> >> >> >> >> > 0        -1
>> >> >> >> >> > 11           1  1 -1   0   1    -1      0      1      0
>> >> >> >> >> > -1
>> >> >> >> >> > 0        -1
>> >> >> >> >> > 12           1 -1 -1   0   1     1      0     -1      0
>> >> >> >> >> > -1
>> >> >> >> >> > 0         1
>> >> >> >> >> > attr(,"assign")
>> >> >> >> >> >  [1] 0 1 2 3 3 4 5 5 6 6 7 7
>> >> >> >> >> > attr(,"contrasts")
>> >> >> >> >> > attr(,"contrasts")$X3
>> >> >> >> >> > [1] "contr.treatment"
>> >> >> >> >> >
>> >> >> >> >> > --------------
>> >> >> >> >> >
>> >> >> >> >> > Specifying the full hierarchy gives us what we expect: the
>> >> >> >> >> > interaction
>> >> >> >> >> > columns are simply calculated from the product of the main
>> >> >> >> >> > effect
>> >> >> >> >> columns.
>> >> >> >> >> > The interaction term X1:X2:X3 gives us two columns in the
>> >> >> >> >> > model
>> >> >> >> >> > matrix,
>> >> >> >> >> > X1:X2:X3B and X1:X2:X3C, matching the products of the main
>> >> >> >> >> > effects.
>> >> >> >> >> >
>> >> >> >> >> > If we remove either the X2:X3 interaction or the X1:X3
>> >> >> >> >> > interaction,
>> >> >> >> >> > we
>> >> >> >> >> get
>> >> >> >> >> > what we would expect for the X1:X2:X3 interaction, but when
>> >> >> >> >> > we
>> >> >> >> >> > remove
>> >> >> >> >> > the
>> >> >> >> >> > X1:X2 interaction the encoding for X1:X2:X3 changes
>> >> >> >> >> > completely:
>> >> >> >> >> >
>> >> >> >> >> > --------------
>> >> >> >> >> >
>> >> >> >> >> >> model.matrix(~(X1+X2+X3)^3-X1:X3,data=runmatrix)
>> >> >> >> >> >> (Intercept)
>> >> >> >> >> >> X1
>> >> >> >> >> >> X2
>> >> >> >> >> X3B X3C X1:X2 X2:X3B X2:X3C X1:X2:X3B X1:X2:X3C
>> >> >> >> >> > 1            1  1  1   0   0     1      0      0         0
>> >> >> >> >> > 0
>> >> >> >> >> > 2            1 -1  1   0   0    -1      0      0         0
>> >> >> >> >> > 0
>> >> >> >> >> > 3            1  1 -1   0   0    -1      0      0         0
>> >> >> >> >> > 0
>> >> >> >> >> > 4            1 -1 -1   0   0     1      0      0         0
>> >> >> >> >> > 0
>> >> >> >> >> > 5            1  1  1   1   0     1      1      0         1
>> >> >> >> >> > 0
>> >> >> >> >> > 6            1 -1  1   1   0    -1      1      0        -1
>> >> >> >> >> > 0
>> >> >> >> >> > 7            1  1 -1   1   0    -1     -1      0        -1
>> >> >> >> >> > 0
>> >> >> >> >> > 8            1 -1 -1   1   0     1     -1      0         1
>> >> >> >> >> > 0
>> >> >> >> >> > 9            1  1  1   0   1     1      0      1         0
>> >> >> >> >> > 1
>> >> >> >> >> > 10           1 -1  1   0   1    -1      0      1         0
>> >> >> >> >> > -1
>> >> >> >> >> > 11           1  1 -1   0   1    -1      0     -1         0
>> >> >> >> >> > -1
>> >> >> >> >> > 12           1 -1 -1   0   1     1      0     -1         0
>> >> >> >> >> > 1
>> >> >> >> >> > attr(,"assign")
>> >> >> >> >> >  [1] 0 1 2 3 3 4 5 5 6 6
>> >> >> >> >> > attr(,"contrasts")
>> >> >> >> >> > attr(,"contrasts")$X3
>> >> >> >> >> > [1] "contr.treatment"
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> >> model.matrix(~(X1+X2+X3)^3-X2:X3,data=runmatrix)
>> >> >> >> >> >> (Intercept)
>> >> >> >> >> >> X1
>> >> >> >> >> >> X2
>> >> >> >> >> X3B X3C X1:X2 X1:X3B X1:X3C X1:X2:X3B X1:X2:X3C
>> >> >> >> >> > 1            1  1  1   0   0     1      0      0         0
>> >> >> >> >> > 0
>> >> >> >> >> > 2            1 -1  1   0   0    -1      0      0         0
>> >> >> >> >> > 0
>> >> >> >> >> > 3            1  1 -1   0   0    -1      0      0         0
>> >> >> >> >> > 0
>> >> >> >> >> > 4            1 -1 -1   0   0     1      0      0         0
>> >> >> >> >> > 0
>> >> >> >> >> > 5            1  1  1   1   0     1      1      0         1
>> >> >> >> >> > 0
>> >> >> >> >> > 6            1 -1  1   1   0    -1     -1      0        -1
>> >> >> >> >> > 0
>> >> >> >> >> > 7            1  1 -1   1   0    -1      1      0        -1
>> >> >> >> >> > 0
>> >> >> >> >> > 8            1 -1 -1   1   0     1     -1      0         1
>> >> >> >> >> > 0
>> >> >> >> >> > 9            1  1  1   0   1     1      0      1         0
>> >> >> >> >> > 1
>> >> >> >> >> > 10           1 -1  1   0   1    -1      0     -1         0
>> >> >> >> >> > -1
>> >> >> >> >> > 11           1  1 -1   0   1    -1      0      1         0
>> >> >> >> >> > -1
>> >> >> >> >> > 12           1 -1 -1   0   1     1      0     -1         0
>> >> >> >> >> > 1
>> >> >> >> >> > attr(,"assign")
>> >> >> >> >> >  [1] 0 1 2 3 3 4 5 5 6 6
>> >> >> >> >> > attr(,"contrasts")
>> >> >> >> >> > attr(,"contrasts")$X3
>> >> >> >> >> > [1] "contr.treatment"
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> >> model.matrix(~(X1+X2+X3)^3-X1:X2,data=runmatrix)
>> >> >> >> >> >> (Intercept)
>> >> >> >> >> >> X1
>> >> >> >> >> >> X2
>> >> >> >> >> X3B X3C X1:X3B X1:X3C X2:X3B X2:X3C X1:X2:X3A X1:X2:X3B
>> >> >> >> >> X1:X2:X3C
>> >> >> >> >> > 1            1  1  1   0   0      0      0      0      0
>> >> >> >> >> > 1
>> >> >> >> >> >     0         0
>> >> >> >> >> > 2            1 -1  1   0   0      0      0      0      0
>> >> >> >> >> > -1
>> >> >> >> >> >     0         0
>> >> >> >> >> > 3            1  1 -1   0   0      0      0      0      0
>> >> >> >> >> > -1
>> >> >> >> >> >     0         0
>> >> >> >> >> > 4            1 -1 -1   0   0      0      0      0      0
>> >> >> >> >> > 1
>> >> >> >> >> >     0         0
>> >> >> >> >> > 5            1  1  1   1   0      1      0      1      0
>> >> >> >> >> > 0
>> >> >> >> >> >     1         0
>> >> >> >> >> > 6            1 -1  1   1   0     -1      0      1      0
>> >> >> >> >> > 0
>> >> >> >> >> >    -1         0
>> >> >> >> >> > 7            1  1 -1   1   0      1      0     -1      0
>> >> >> >> >> > 0
>> >> >> >> >> >    -1         0
>> >> >> >> >> > 8            1 -1 -1   1   0     -1      0     -1      0
>> >> >> >> >> > 0
>> >> >> >> >> >     1         0
>> >> >> >> >> > 9            1  1  1   0   1      0      1      0      1
>> >> >> >> >> > 0
>> >> >> >> >> >     0         1
>> >> >> >> >> > 10           1 -1  1   0   1      0     -1      0      1
>> >> >> >> >> > 0
>> >> >> >> >> >     0        -1
>> >> >> >> >> > 11           1  1 -1   0   1      0      1      0     -1
>> >> >> >> >> > 0
>> >> >> >> >> >     0        -1
>> >> >> >> >> > 12           1 -1 -1   0   1      0     -1      0     -1
>> >> >> >> >> > 0
>> >> >> >> >> >     0         1
>> >> >> >> >> > attr(,"assign")
>> >> >> >> >> >  [1] 0 1 2 3 3 4 4 5 5 6 6 6
>> >> >> >> >> > attr(,"contrasts")
>> >> >> >> >> > attr(,"contrasts")$X3
>> >> >> >> >> > [1] "contr.treatment"
>> >> >> >> >> >
>> >> >> >> >> > --------------
>> >> >> >> >> >
>> >> >> >> >> > Here, we now see the encoding for the interaction X1:X2:X3
>> >> >> >> >> > is
>> >> >> >> >> > now
>> >> >> >> >> > the
>> >> >> >> >> > interaction of X1 and X2 with a new encoding for X3 where
>> >> >> >> >> > each
>> >> >> >> >> > factor
>> >> >> >> >> level
>> >> >> >> >> > is represented by its own column. I would expect, given the
>> >> >> >> >> > two
>> >> >> >> >> > column
>> >> >> >> >> > dummy variable encoding for X3, that the X1:X2:X3 column
>> >> >> >> >> > would
>> >> >> >> >> > also
>> >> >> >> >> > be
>> >> >> >> >> two
>> >> >> >> >> > columns regardless of what two-factor interactions we also
>> >> >> >> >> > specified,
>> >> >> >> >> > but
>> >> >> >> >> > in this case it switches to three. If other two factor
>> >> >> >> >> > interactions
>> >> >> >> >> > are
>> >> >> >> >> > missing in addition to X1:X2, this issue still occurs. This
>> >> >> >> >> > also
>> >> >> >> >> > happens
>> >> >> >> >> > regardless of the contrast specified in contrasts.arg for
>> >> >> >> >> > X3. I
>> >> >> >> >> > don't
>> >> >> >> >> > see
>> >> >> >> >> > any reasoning for this behavior given in the documentation,
>> >> >> >> >> > so
>> >> >> >> >> > I
>> >> >> >> >> > suspect
>> >> >> >> >> it
>> >> >> >> >> > is a bug.
>> >> >> >> >> >
>> >> >> >> >> > Best regards,
>> >> >> >> >> > Tyler Morgan-Wall
>> >> >> >> >> >
>> >> >> >> >> >         [[alternative HTML version deleted]]
>> >> >> >> >> >
>> >> >> >> >> > ______________________________________________
>> >
>> >
>
>