[R] Predictions from a Cox model - understanding centering of binary/categorical variables

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Thu May 24 21:15:31 CEST 2018


Generally speaking, primarily statistical issues, which this appears to
be,  are OT on R-help, which is concerned with R programming issues
(although they do someties intersect). You might therefore do better on a
statistical forum such as stats.stackexchange.com (especially if you do not
get a satisfactory response here).

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Thu, May 24, 2018 at 8:56 AM, Bonnett, Laura <L.J.Bonnett using liverpool.ac.uk
> wrote:

> Dear all,
>
> I am using R 3.4.3 on Windows 10.  I am preparing some teaching materials
> and I'm having trouble matching the by-hand version with the R code.
>
> I have fitted a Cox model - let's use the ovarian data as an example:
> library(survival)
> data(ovarian)
> ova_mod <- coxph(Surv(futime,fustat)~age+rx,data=ovarian)
>
> If I want to make predict survival for a new set of individuals at 100
> days then that is trivial using predict.coxph e.g.:
> newdata <- data.frame(futime=rep(100,5),fustat=rep(1,5),age=c(45,50,
> 55,60,65),rx=c(1,2,1,2,1))
> preds <- predict(ova_mod,newdata,type="expected")
> survpreds <- exp(-1*preds)
> survpreds
> [1] 0.9967635 0.9969739 0.9859543 0.9868629 0.9401437
>
> However, due to centering I believe, I am finding this a bit difficult to
> replicate by hand.
>
> To replicate the analysis I need the baseline survival at 100 days, the
> regression coefficients, and the mean/proportions.
> Baseline survival at 100 days: summary(survfit(ova_mod),time=100)$surv =
> 0.9888
> Regression coefficient: ova_mod$coef = 0.1473 (age) & -0.8040
> Mean age: mean(ovarian$age) = 56.1654
> Proportions with rx: prop.table(table(ovarian$rx)) = 0.5 0.5
>
> So, to recreate the predicted survival probability I would work out the
> linear predictor for my first individual:
> LP1 = ((45-56.1654)*0.1437)+((1*0.5)*-0.8040) = -2.0470
> However, predict(ova_mod,newdata,type="lp") suggests that it should be
> -1.2430.  Therefore have I misinterpreted the centering?  I.e that we take
> off the mean value from continuous variables, and multiply by the
> proportion with that response for binary/categorical variables?
>
> Then, assuming the I have the correct LP of -1.2430, I need to raise the
> baseline survival estimate at 100 days to the exp(LP1).  This gives 0.9968
> which is almost the predicted value of 0.9968 of above.  However, I need to
> get the linear predictors to agree with the output first!
>
> Many thanks for your help with this.
>
> Kind regards,
> Laura
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]




More information about the R-help mailing list