[R] Time-dependent coefficients in a Cox model with categorical variants

David Winsemius dwinsemius at comcast.net
Mon Jan 15 21:12:22 CET 2018


> On Jan 15, 2018, at 12:58 AM, Max Shell <archerrish at gmail.com> wrote:
> 
> Suppose I have a dataset contain three variants, looks like
>> head(dta)
> 
>  Sex    tumorsize    Histology       time         status
>    0            1.5            2              12.1000             0
>    1            1.8            1              38.4000             0
> .....................
> 
> Sex: 1 for male; 0 for female., two levels
> Histology: 1 for SqCC; 2 for High risk AC; 3 for low risk AC, three levels
> Now I need to get a Time-dependent coefficients cox fit:
> 
> library(survival)
> for(i in c(1,3) dta[,i] <- factor(dta[,i])
> fit <-
>  coxph(
>    Surv(time, status) ~  Sex + tumorsize +  Histology + tt(Histology),
>    data = dta,
>    tt = function(x, t, ...) x * log(t)
>  )
> 
> But  I keep gettting this error says:
> 
> Error in if (any(infs)) warning(paste("Loglik converged before variable ",  :
>  missing value where TRUE/FALSE needed
> In addition: Warning message:
> In Ops.factor(x, log(t)) : ‘*’ not meaningful for factors.

The error message seems pretty clear. You are passing a factor to the x parameter of the tt function, and then you are attempting to multiply that value times log(t). You are lucky that it was a factor and not jsut an integer because then you might not have gotten an error or a warning.  I worry that your hopes of separate estimates may not be easily supportable by the tt-mechanism. However, the example of its use in the help page for `coxph` shows a spline function being passed and the boundary knots and weights mush estimated, so my fears may be over-blown. The knot locations and weights are not reported in the print.coxph but it doesn't look too difficult to extract then from the attributes of the model. 

You might look at the mechanism for estimation of spline component effects to see if you can learn how to estimate multiple components:

> coef( coxph(Surv(time, status) ~ ph.ecog + tt(age), data=lung,
+      tt=function(x,t,...) pspline(x + t/365.25))  )
           ph.ecog  ps(x + t/365.25)3  ps(x + t/365.25)4  ps(x + t/365.25)5  ps(x + t/365.25)6  ps(x + t/365.25)7 
         0.4528363          0.2426635          0.4876185          0.7796924          1.0160954          1.0765967 
 ps(x + t/365.25)8  ps(x + t/365.25)9 ps(x + t/365.25)10 ps(x + t/365.25)11 ps(x + t/365.25)12 ps(x + t/365.25)13 
         1.0449439          0.9170725          0.9276695          1.1349794          1.2837341          1.7024045 
ps(x + t/365.25)14 
         2.1863712 

> attr( coxph(Surv(time, status) ~ ph.ecog + tt(age), data=lung,
+      tt=function(x,t,...) pspline(x + t/365.25))$terms, "predvars" )
list(Surv(time, status), ph.ecog, pspline(age, nterm = 10, intercept = FALSE, 
    Boundary.knots = c(39.0136892539357, 82.0848733744011), `NA` = NULL))

The coefficients for the spline also get reported as exponentiated values in the summary output. And if you  used a crossing operator in the formula you get some sort of interaction result. Whether it has any sensible interpretation is decision that's above my pay grade.

The code for `pspline` is readily available. It's not even necessary to sue the triple-colon or getAnywhere functions.

-- 
David.
> 
> How can I fix it? I know that the "Sex" and "Histology" are both
> categorical variants. I  want to have a model that have two β(t) = a +
> blog(t) for each histology level.
> Thank you!
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'   -Gehm's Corollary to Clarke's Third Law



More information about the R-help mailing list