[Rd] Issues with drop.terms

Therneau, Terry M., Ph.D. therne@u @end|ng |rom m@yo@edu
Mon Aug 23 16:36:29 CEST 2021


This is a follow-up to my earlier note on [.terms.   Based on a couple days' work getting 
the survival package to work around  issues, this will hopefully be more concise and 
better expressed than the prior note.

1.
test1 <- terms( y ~ x1:x2 + x3)
check <- drop.terms(termobj =test1, dropx = 1)
formula(check)
## ~x1:x2

The documentation for the dropx argument is "vector of positions of variables to drop from 
the right hand side of the model", but it is not clear what "positions" is.   I originally 
assumed "the order in the formula as typed", but was wrong.   I suggest adding a line  
"Position refers to the order of terms in the term.labels attribute of the terms object, 
which is also the order they will appear in a coefficient vector (not counting the 
intercept).

2.
library(splines)
test2 <- terms(model.frame(mpg ~  offset(cyl) + ns(hp, df=3) + disp + wt, data=mtcars))
check2 <- drop.terms(test2,  dropx = 2)
formula(check2)
## ~ns(hp, df=3) + wt

One side effect of how drop.terms is implemented, and one that I suspect was not intended, 
is that offsets are completly ignored.    The above drops both the offset and the disp 
term from the formula   The dataClasses and predvars attributes of the result are also 
incorrect: they have lost the ns() term rather than the disp term;
the results of predict will be incorrect.

attr(check2, "predvars")
##    list(offset(cyl), disp, wt)

Question: should the function be updated to not drop offsets? If not a line needs to be 
added to the help file.   The handling of predvars needs to be fixed regardless.

3.
test3 <- terms(mpg ~ hp + (cyl==4) + disp + wt )
check3 <- drop.terms(test3, 3)
formula(check3)
lm( check3, data=mtcars)   # fails

The drop.terms action has lost the () around the logical expression, which leads to an 
invalid formula.  We can argue that the user should have used I(cyc==4), but very many won't.

4. As a footnote, more confusion (for me) is generated by the fact that the "specials" 
attribute of a formula does not use the numbering discussed in 1 above.   I had solved 
this issue long ago in the untangle.specials function; long enough ago that I forgot I had 
solved it, and just wasted a day rediscovering that fact.

---

I can create a patch for 1 and 2 (once we answer my question), but a fix for 3 is not 
clear to me.  It currently leads to failure in a coxph call that includes a strata so I am 
directly interested in a solution; e.g.,  coxph(Surv(time, status) ~ age + (ph.ecog==2) + 
strata(inst), data=lung)

Terry T

-- 

Terry M Therneau, PhD
Department of Quantitative Health Sciences
Mayo Clinic
therneau using mayo.edu

"TERR-ree THUR-noh"


	[[alternative HTML version deleted]]



More information about the R-devel mailing list