[R] Time-dependent coefficients in a Cox model with categorical variants

Thu Jan 18 21:13:56 CET 2018

Offlist... for your information...

It is unfair to suggest that the mailing list participants are at fault for using old software.  Even if the mailing list participants use email programs that can handle HTML, any email that goes through the list gets the formatting stripped, which leaves it damaged to some degree. It might not seem like this because sometimes you CAN see formatting, but that only happens when you are listed in the "To" or "Cc" fields... the rest of the list saw a stripped version regardless of how good their mail program was. Just go look at the archives to confirm this. Net result is the rest of the participants see a more or less damaged version of the discussion/code whenever HTML is used on list.
-- 
Sent from my phone. Please excuse my brevity.

On January 18, 2018 11:38:17 AM PST, "Therneau, Terry M., Ph.D." <therneau at mayo.edu> wrote:
>
>First, as others have said please obey the mailing list rules and turn
>of
>First, as others have said please obey the mailing list rules and turn
>off html, not everyone uses an html email client.
>
>Here is your code, formatted and with line numbers added.  I also fixed
>one error: "y" should be "status".
>
>1. fit0 <- coxph(Surv(futime, status) ~ x1 + x2 + x3, data = data0)
>2. p <- log(predict(fit0, newdata = data1, type = "expected"))
>3. lp <- predict(fit0, newdata = data1, type = "lp")
>4. logbase <- p - lp
>5. fit1 <- glm(status ~ offset(p), family = poisson, data = data1)
>6. fit2 <- glm(status~ lp + offset(logbase), family = poisson, data =
>data1)
>7. group <- cut(lp, c(-Inf, quantile(lp, (1:9) / 10), Inf))
>8. fit3 <- glm(status ~ -1 + group + offset(p), family = poisson, data
>= data1)
>
>The key idea of the paper you referenced is that the counterpart to the
>Hosmer-Lemishow test (wrong if used directly in a Cox model) is to look
>at the predicted values from a Cox model as input to a Poisson
>regression.  That means adding the expected from the Cox model as a
>fixed term in the Poisson.  And like any other poisson that means
>offset(log(expected)) as a term.
>
>The presence of time dependent covariates does nothing to change this,
>per se, since expected for time fixed is the same as for time varying. 
>In practice it does matter, at least philosophically.  Lines 1, 2, 5 do
>this just fine.
>
>If data1 is not the same as data0, a new study say, then the test for
>intercept=0 from fit1 is a test of overall calibration.  Models like
>line 8 try to partition out where any differences actually lie.
>
>The time-dependent covariates part lies in the fact that a single
>subject may be represented by multiple lines in data0 and/or data1.  Do
>you want to collapse that person into a single row before the glm fits?
>If subject "Jones" is represented by 15 lines in the data and "Smith"
>by 2, it does seem a bit unfair to give Jones 15 observations in the
>glm fit.  But full discussion of this is as much philosophy as
>statistics, and is perhaps best done over a beer.
>
>Terry T.
>
>________________________________
>From: Max Shell [archerrish at gmail.com]
>Sent: Wednesday, January 17, 2018 10:25 AM
>To: Therneau, Terry M., Ph.D.
>Subject: Re: Time-dependent coefficients in a Cox model with
>categorical variants
>
>Assessing calibration of Cox model with time-dependent
>coefficients<https://stats.stackexchange.com/questions/323569/assessing-calibration-of-cox-model-with-time-dependent-coefficients>
>
>I am trying to find methods for testing and visualizing calibration to
>Cox models with time-depended coefficients. I have read your nice
>article<http://journals.sagepub.com/doi/10.1177/0962280213497434>. In
>this paper, we can fit three models:
>
>fit0 <- coxph(Surv(futime, status) ~ x1 + x2 + x3, data = data0) p <-
>log(predict(fit0, newdata = data1, type = "expected")) lp <-
>predict(fit0, newdata = data1, type = "lp") logbase <- p - lp fit1 <-
>glm(y ~ offset(p), family = poisson, data = data1) fit2 <- glm(y ~ lp +
>offset(logbase), family = poisson, data = data1) group <- cut(lp,
>c(-Inf, quantile(lp, (1:9) / 10), Inf)) fit3 <- glm(y ~ -1 + group +
>offset(p), family = poisson, data = data1)
>
>Here$B!$(BI simplely use data1$B!!(B<- data0[1:500,]
>
>First, I get following error when running line 5.
>
>Error in eval(predvars, data, env) : object 'y' not found
>
>So I modifited the code by replacing the y as status looks like this:
>
>fit1 <- glm(status ~ offset(p), family = poisson, data = data1) fit2 <-
>glm(status ~ lp + offset(logbase), family = poisson, data = data1)
>group <- cut(lp, c(-Inf, quantile(lp, (1:9) / 10), Inf)) fit3 <-
>glm(status ~ -1 + group + offset(p), family = poisson, data = data1)
>
>Is this replacing correct?
>
>Second, I try to introduce the time-transform use coxph with
>ttparament.
>
>My code is:  fit0 <- coxph(Surv(time, status) ~ x1 + x2 + x3 + tt(x3),
>data = data0, function(x, t, ...) x * t) p <- log(predict(fit0, newdata
>= data1, type = "expected")) lp <- predict(fit0, newdata = data1, type
>= "lp") logbase <- p - lp fit1 <- glm(status ~ offset(p), family =
>poisson, data = data1) fit2 <- glm(status ~ lp + offset(logbase),
>family = poisson, data = data1) group <- cut(lp, c(-Inf, quantile(lp,
>(1:9) / 10), Inf)) fit3 <- glm(status ~ -1 + group + offset(p), family
>= poisson, data = data1)
>
>My questions is:
>
>  *   Is the code above correct?
>*   How to interpret the fit1, fit2, fit3? What's the connection
>between the three models and the calibration of the Cox model?
>*   How to generate the calibration plot using fit3? The article dose
>have a section discuss this, but no code is provided.
>
>Thank you!
>
>On Mon, Jan 15, 2018 at 9:23 PM, Therneau, Terry M., Ph.D.
><therneau at mayo.edu<mailto:therneau at mayo.edu>> wrote:
>The model formula " ~ Histology" knows how to change your 3 level
>categorical variable into two 0/1 dummy variables for a regression
>matrix.  The tt() call is a simple function, however, and ordinary
>multiplication and does not have those powers.  In this case you need
>to do the setup by hand : create your own 0/1 dummy variables and work
>with them.
>
>sqcc <- ifelse(dta$Histology == 'Sqcc', 0, 1)
>hrac <- ifelse(dta$Histology == 'High risk AC', 0, 1)
>fit <- coxph(Surv(time, status) ~ Sex + sqcc + hrac + tt(sqcc) +
>tt(hrac),
>                 data = dta, tt = list(function(x,t, ...) x*log(t),
>                                       function(x, t, ...) x* log(t)))
>
>
>Terry Therneau
>
>PS I've rarely found x*log(t) to be useful, but perhaps you have
>already looked at the cox.zph plots and see that shape.
>
>
>Suppose I have a dataset contain three variants, looks like
>head(dta)
>   Sex    tumorsize    Histology       time         status
>     0            1.5            2              12.1000             0
>     1            1.8            1              38.4000             0
>.....................
>
>Sex: 1 for male; 0 for female., two levels
>Histology: 1 for SqCC; 2 for High risk AC; 3 for low risk AC, three
>levels
>Now I need to get a Time-dependent coefficients cox fit:
>
>library(survival)
>for(i in c(1,3) dta[,i] <- factor(dta[,i])
>fit <-
>   coxph(
>    Surv(time, status) ~  Sex + tumorsize +  Histology + tt(Histology),
>     data = dta,
>     tt = function(x, t, ...) x * log(t)
>   )
>
>But  I keep gettting this error says:
>
>Error in if (any(infs)) warning(paste("Loglik converged before variable
>",  :
>   missing value where TRUE/FALSE needed
>In addition: Warning message:
>In Ops.factor(x, log(t)) : ?*? not meaningful for factors.
>
>How can I fix it? I know that the "Sex" and "Histology" are both
>categorical variants. I  want to have a model that have two ?(t) = a +
>blog(t) for each histology level.
>Thank you?
>
>
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.