[R] predict.coxph

Terry Therneau therneau at mayo.edu
Fri Nov 12 18:09:29 CET 2010


Since I read the list in digest form (and was out ill yesterday) I'm
late to the discussion.

There are 3 steps for predicting survival, using a Cox model:

1. Fit the data
 fit <- coxph(Surv(time, status) ~ age + ph.ecog, data=lung)

The biggest question to answer here is what covariates you wish to base
the prediction on.  There is the usual tradeoff between too few (leave
out something important) or too many (including unimportant things).

2. Get survival curves
  curves <- survfit(fit, newdata= _____)
The newdata needs to include all the covariates in your model.  

3. Summarize
 Note that you don't get a single number prediction for each subject,
you get a set of survival curves.  plot(curves[1]) for instance shows
you the first one, plot(curves[2]) the second. 
  print(curves) will give a 1 line per curve summary including the
median, and optionally one of several versions of the mean. See the
discussion in help(print.survfit).  The mean is rarely used as a summary
due to the fact that we don't see the whole distribution.  (Use temp<-
summary(curves); temp$table to use the printout values in further
calculations.)

-------------------

  The same process applies for parametric survival using survreg.  In
return for specifying a distributional form, the predicted survival
curve for a particular subject is completely defined.  This includes the
mean and all quantiles.  Reliablity analysis (survival analysis in
industry) uses parametric almost exclusively, since the tail of the
distribution is of greatest interest.  Your use of
predict(,type='response') is almost correct, there is just the math
detail that the Weibull fits on a log scale, so the returned value is a
geometric mean time to death rather than an arithmetic mean. 

 The suggestion to use ordinary regression on the observed times is
wrong.  Censored data is more complex than that.

Terry Therneau



More information about the R-help mailing list