[Rd] Error in [.terms

Therneau, Terry M., Ph.D. therne@u @end|ng |rom m@yo@edu
Fri Oct 4 16:19:51 CEST 2019


Martin,

   There are a couple of issues with [.terms that have bitten my survival code.  At the 
useR conference I promised you a detailed (readable) explanation, and have been lax in 
getting it to you. The error was first pointed out in a bugzilla note from 2016, by the 
way.  The current survival code works around these.

Consider the following formula:

<<testform>>=
library(survival)  # only to get access to the lung data set
test <- Surv(time, status) ~  age + offset(ph.ecog) + strata(inst)
tform <- terms(test, specials="strata")
mf <- model.frame(tform, data=lung)
mterm <- terms(mf)
@

The strata term is handled in a special way by coxph, and then needs to be removed from 
the model formula before calling model.matrix.
To do this the code uses essentially the following, which fails for the formula above.

<<strata>>=
strata <- attr(mterm, "specials")$strata - attr(mterm, "response")
X <- model.matrix(mterm[-strata], mf)
@

The root problem is the need for multiple subscripts.
\begin{itemize}
   \item The formula itself has length 5, with `~' as the first element
   \item The variables and predvars attributes are call objects, each a list() with 4 
elments: the response and all 3 predictors
   \item The term.labels attribute omits the resonse and the offset, so has  length 2
   \item The factors attribute has 4 rows and 2 columns
   \item The dataClasses attribute is a character vector of length 4
\end{itemize}

So the ideal result of  mterm[remove the specials] would use subscript of
\begin{itemize}
   \item [-5] on the formula itself, variables and predvars attributes
   \item [-2] for term.labels
   \item [-4 , -2, drop=FALSE] for factor attribute
   \item [-2] for order attribute
   \item [-4] for the dataClasses attribute
\end{itemize}

That will recreate the formula that ``would have been'' had there been no strata term.  
Now look at the first portion of the code in models.R
<<>>=
`[.terms` <- function (termobj, i)
{
     resp <- if (attr(termobj, "response")) termobj[[2L]]
     newformula <- attr(termobj, "term.labels")[i]
     if (length(newformula) == 0L) newformula <- "1"
     newformula <- reformulate(newformula, resp, attr(termobj, "intercept"), 
environment(termobj))
     result <- terms(newformula, specials = names(attr(termobj, "specials")))

     # Edit the optional attributes
}
@

The use of reformulate() is a nice trick.  However, the index reported in the specials 
attribute is generated with reference to the variables
attribute, or equivalently the row names of the factors attribute, not with respect to the 
term.labels attribute. For consistency the second line should instead be
<<>>=
newformula <- row.names(attr(termobj, "factors"))[i]
@

Of course, this will break code for anyone else who has used [.terms and, like me, has 
been adjusting for the ``response is counted in specials but
not in term.labels'' feature.  R core will have to discuss/decide what is the right thing 
to do, and I'll adapt.

The reformulate trick breaks in another way, one that only appeared on my radar this week 
via a formula like the following.

<<form2>>=
Surv(time, status) ~ age + (sex=='male') + strata(inst)
@

In both the term.labels attribute and the row/col names of the factors attribute the 
parentheses disappear, and the result of the reformulate call is not a proper formula.  
The + binds tighter than == leading to an error message that will confuse most users. We 
can argue, and I probably would, that the user should have used I(sex=='male').  But they 
didn't, and without the I() it is a legal formula, or at least one that currently works.  
Fixing this issue is a lot harder.

An offset term causes issues in the 'Edit the optional attributes' part of the routine as 
well.  If you and/or R core will tell me what you think
the code should do, I'll create a patch.  My vote would be to use rownames per the above 
and ignore the () edge case.

The same basic code appears in drop.terms, by the way.

Terry T.


	[[alternative HTML version deleted]]



More information about the R-devel mailing list