[Rd] Issues with drop.terms
bbo|ker @end|ng |rom gm@||@com
Tue Aug 24 00:03:45 CEST 2021
Small follow-up: (1) in order for lm() to actually work you need
keep.response=TRUE in the drop.terms() call (I realize that this is
*not* the problem in your example)
test4 <- terms(mpg ~ hp + I(cyl==4) + disp + wt )
check4 <- drop.terms(test4, 3, keep.response = TRUE)
lm( check4, data=mtcars)
(2) I'm ambivalent about your "We can argue that the user should have
used I(cyl==4), but very many won't." argument. This is the ever-present
"document precisely and require users to know and follow the
documentation" vs. "try to protect users from themselves" debate -
taking either side to an extreme is (IMO) unproductive. I don't know how
hard it would be to make drop.terms() **not** drop parentheses, but it
seems like it may be very hard/low-level. My vote would be to see if
there is a reasonably robust way to detect these constructions and
**warn** about them.
I have probably asked about this before, but if anyone knows of
useful materials that go into more details about the definitions and
implementation of model matrix/terms/etc. machinery, *beyond* the
appropriate chapter of "Statistical Models in S" (Becker/Chambers white
book), *or* the source code itself, I would love some pointers ...
On 8/23/21 10:36 AM, Therneau, Terry M., Ph.D. via R-devel wrote:
> This is a follow-up to my earlier note on [.terms. Based on a couple days' work getting
> the survival package to work around issues, this will hopefully be more concise and
> better expressed than the prior note.
> test1 <- terms( y ~ x1:x2 + x3)
> check <- drop.terms(termobj =test1, dropx = 1)
> ## ~x1:x2
> The documentation for the dropx argument is "vector of positions of variables to drop from
> the right hand side of the model", but it is not clear what "positions" is. I originally
> assumed "the order in the formula as typed", but was wrong. I suggest adding a line
> "Position refers to the order of terms in the term.labels attribute of the terms object,
> which is also the order they will appear in a coefficient vector (not counting the
> test2 <- terms(model.frame(mpg ~ offset(cyl) + ns(hp, df=3) + disp + wt, data=mtcars))
> check2 <- drop.terms(test2, dropx = 2)
> ## ~ns(hp, df=3) + wt
> One side effect of how drop.terms is implemented, and one that I suspect was not intended,
> is that offsets are completly ignored. The above drops both the offset and the disp
> term from the formula The dataClasses and predvars attributes of the result are also
> incorrect: they have lost the ns() term rather than the disp term;
> the results of predict will be incorrect.
> attr(check2, "predvars")
> ## list(offset(cyl), disp, wt)
> Question: should the function be updated to not drop offsets? If not a line needs to be
> added to the help file. The handling of predvars needs to be fixed regardless.
> test3 <- terms(mpg ~ hp + (cyl==4) + disp + wt )
> check3 <- drop.terms(test3, 3)
> lm( check3, data=mtcars) # fails
> The drop.terms action has lost the () around the logical expression, which leads to an
> invalid formula. We can argue that the user should have used I(cyc==4), but very many won't.
> 4. As a footnote, more confusion (for me) is generated by the fact that the "specials"
> attribute of a formula does not use the numbering discussed in 1 above. I had solved
> this issue long ago in the untangle.specials function; long enough ago that I forgot I had
> solved it, and just wasted a day rediscovering that fact.
> I can create a patch for 1 and 2 (once we answer my question), but a fix for 3 is not
> clear to me. It currently leads to failure in a coxph call that includes a strata so I am
> directly interested in a solution; e.g., coxph(Surv(time, status) ~ age + (ph.ecog==2) +
> strata(inst), data=lung)
> Terry T
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
Graduate chair, Mathematics & Statistics
More information about the R-devel