[R] Expected number of events, Andersen-Gill model fit via coxph in package survival

Omar De la Cruz C. odelacruzc at gmail.com
Thu Oct 11 19:26:04 CEST 2012

Thank you, Dr. Therneau, that was very helpful.

Best regards,


On Mon, Oct 8, 2012 at 9:58 AM, Terry Therneau <therneau at mayo.edu> wrote:
>> I am interested in producing the expected number of events, in a
>> recurring events setting. I am using the Andersen-Gill model, as fit
>> by the function "coxph" in the package "survival."
>> I need to produce expected numbers of events for a cohort,
>> cumulatively, at several fixed times. My ultimate goal is: To fit an
>> AG model to a reference sample, then use that fitted model to generate
>> expected numbers of events for a new cohort; then, comparing the
>> expected vs. the observed numbers of events would give us some idea of
>> whether the new cohort differs from the reference one.
>>> From my reading of the documentation and the text by Therneau and
>> Grambsch, it seems that the function "survexp" is what I need. But
>> using it I am not able to obtain expected numbers of events that match
>> reasonably well the observed numbers *even for the same reference
>> population.* So, I think I am misunderstanding something quite badly.
>  You've hit a common confusion.  Observed versus expected events
> computations are done on a cumulative hazard scale H, not the surivival
> scale S; S = exp(-H).  Relating this back to simple Poisson models H(t)
> would be the expected number of events by time t and S(t) the probability of
> "no events before time t".  G. Berry (Biometrics 1983) has a classic ane
> readable article on this (especially if you ignore the proofs).
>   Using your example:
>> cphfit <-
>> coxph(Surv(start,stop,event)~rx+number+size+cluster(id),data=bladder2)
>> zz <- predict(cphfit, type='expected')
>> c(sum(zz), sum(bladder2$event))
> [1] 112 112
>> tdata <- bladder2[1:10]   #new data set (lazy way)
>> predict(cphfit, type='expected', newdata=tdata)
>  [1] 0.0324089 0.3226540 0.4213402 1.0560768 0.6702130 0.2163531 0.6490665
>  [8] 0.8864808 0.2932915 0.5190647
>  You can also do this using survexp and the cohort=FALSE argument, which
> would return S(t) for each subject and we would then use -log(result) to get
> H.  This is how it was done when I wrote the book, but the newer predict
> function is easier.
> Terry Therneau

More information about the R-help mailing list