[R] coxph linear.predictors

Terry Therneau therneau at mayo.edu
Fri Oct 29 00:38:51 CEST 2010


Gentlemen,
  I read R-news in batch mode so I'm often a day behind.  Let me try to
answer some of the questions.

 1. X*beta  != linear.predictor.  
I'm sorry if the documentation isn't all it could be.  Between the book,
tech report, and help I've written about 400 pages, but this particular
topic isn't yet in it.  The final snipe about being "opaque like SAS"
was really unfair.
The Cox model is a relative risk model, if lp is a linear predictor then
so is lp +c for any constant; they are equally good and equally valid.
The linear.predictor component in a coxph fit is (X-means) * beta.  The
computation exp(lp) occurs multiple times downstream and this keep the
exp function from overflowing when there is something like a Date object
as a predictor.  Adding this constant changes not a single downstream
calcuation.

2. Survfit is too slow.
 I'd like to hear more about this.  My work mostly involves modest data
sets so perhaps I haven't seen it.  Accuracy and maintainability have
been my first worries.

3. Baseline survival.
 Let xbase be a particular set of values for the x covariates (one for
each).  The survival curve for a given xbase is obtained from survfit
   fit <- coxph(....
   sfit <- survfit(fit, newdata=xbase)
   chaz <- -log(sfit$surv)  #cumulative hazard
(The xbase vector will need to have variable names for the function to
know which value goes to which of course).

The cumulative hazard for any other subject will be 
   newhaz <- chaz * exp(fit$coef%*% (x-xbase))
There is not a simple transformation of the standard error from one fit
to another, however.  You will need to call survfit with a data frame
for newdata, which will return one curve per row with the proper values.

In my view there is no such thing as "A" baseline survival curve.  Any
xbase you chose is a baseline.  However, it is wise to choose something
near the center of the data space in order to avoid numeric problems
with the exp function above.  I would never ever chose a vector of
zeros, although some text books do -- it saves them about 8 characters
of typing in the newhaz formula above.

Terry Therneau



More information about the R-help mailing list