[R] Building a web risk calculator based on Cox, PH--definitive method for calculating probability?

Terry Therneau therneau at mayo.edu
Wed Jul 18 15:06:47 CEST 2012

Here is an example of how to do it.
 > library(survival)
 > vfit <- coxph(Surv(time, status) ~ celltype + trt, data=veteran)

 > userinput <- data.frame(celltype="smallcell", trt = 1)
 > usercurve <- survfit(vfit, newdata=userinput)  #the entire predicted 
survival curve
 > user2 <- summary(usercurve, time= 2*365.25)    # 2 year time point
 > user2$surv
[1] 0.0007438084

Some comments:
  1. The summary function for survival curves was written so that people 
could print out shorter summaries, but it works nicely for picking off a 
particular time point.  Since the curve is a step function and there 
isn't likely to be a step at exactly "x" years, this is a bit more work 
to do yourself.  You might want to include the confidence limits in your 
web report as well.

  2. Put the whole formula into your coxph call.  I have never ever 
understood why people use the construct
      tempvar <- with(data, Surv(time, status))
      coxph(tempvar ~ age + sex + ....
It leaves you with harder to read code, poorer documentation (the 
printout from coxph no longer shows the actual response variable), leads 
to hard-to-diagnose failures for certain uses of predict, ... the list 
goes on.  I have not yet thought of a single good reason for doing it 
other than "because you can".

  3. Make the user data the same as the original.  In the veteran cancer 
data set "trt" is a numeric 0/1 variable, you had it as a factor in the 
new data set.

  4. Your should get your keyboard fixed -- it appears that the spacebar 
is disabled when writing code  :-)

  5. If you plot the survival curve for the veterans cancer data set it 
only reaches to about 2 1/2 years, so the summary for 5 years will 
return NULL.

Terry Therneau

On 07/18/2012 05:00 AM, r-help-request at r-project.org wrote:
> I am a medical student and as a capstone for my summer research project I am
> going to create a simple online web "calculator" for users to input their
> relevant data, and a probability of relapse within 5 years will be computed
> and returned based on the Cox PH model I have developed.
> The issue I'm having is finding a definitive method/function to feed the
> user's "newdata" and return the probability of relapse within 5 years.  I
> have googled this and the answers seems to be inconsistent; I have variously
> seen people recommend survest(), survfit(), and predict.coxph().  Terry had
> a useful answer
> http://r.789695.n4.nabble.com/how-to-calculate-predicted-probability-of-Cox-model-td4515265.html
> here  but I didn't quite understand what he meant in his last sentence.
> Here is some code for you to quickly illustrate what you suggest.
> library(rms)
> library(survival)
> library(Hmisc)
> data(veteran)
> dd=datadist(veteran)
> options(datadist='dd')
> options(digits=4)
> obj=with(veteran,Surv(time,status))
> vetcoxph=coxph(obj~celltype+trt,data=veteran)    #I will fit models from
> both the survival and rms packages so you can
> #use what you like
> vetcph=cph(obj~celltype+trt,data=veteran,surv=TRUE,time.inc=5*365,x=T,y=T)
> #let's say the user inputted that their cell type was smallcell and their
> treatment was "1".
> userinput=data.frame(celltype='smallcell',trt=factor(1))
> I really appreciate your recommendations
> Best,
> Jahan

More information about the R-help mailing list