[R] Expected number of events, Andersen-Gill model fit via coxph in package survival

Sun Oct 21 01:06:10 CEST 2012

I have a follow-up question (for either Dr. Therneau, or anybody who
might know).

sum(zz) (see below) estimates the number of events for the cohort.
Now, how can I compute a confidence interval for sum(zz)? Or a
standard error?

My obvious choice, square root of the sum of the squares of the
standard errors for zz provided by predict.coxph, turns out to be too
small.

Thank you for any suggestions.

Omar.

On Thu, Oct 11, 2012 at 1:26 PM, Omar De la Cruz C.
<odelacruzc at gmail.com> wrote:
> Thank you, Dr. Therneau, that was very helpful.
>
> Best regards,
>
> Omar.
>
>
> On Mon, Oct 8, 2012 at 9:58 AM, Terry Therneau <therneau at mayo.edu> wrote:
>>
>>> I am interested in producing the expected number of events, in a
>>> recurring events setting. I am using the Andersen-Gill model, as fit
>>> by the function "coxph" in the package "survival."
>>>
>>> I need to produce expected numbers of events for a cohort,
>>> cumulatively, at several fixed times. My ultimate goal is: To fit an
>>> AG model to a reference sample, then use that fitted model to generate
>>> expected numbers of events for a new cohort; then, comparing the
>>> expected vs. the observed numbers of events would give us some idea of
>>> whether the new cohort differs from the reference one.
>>>
>>>> From my reading of the documentation and the text by Therneau and
>>>
>>> Grambsch, it seems that the function "survexp" is what I need. But
>>> using it I am not able to obtain expected numbers of events that match
>>> reasonably well the observed numbers *even for the same reference
>>> population.* So, I think I am misunderstanding something quite badly.
>>>
>>
>>  You've hit a common confusion.  Observed versus expected events
>> computations are done on a cumulative hazard scale H, not the surivival
>> scale S; S = exp(-H).  Relating this back to simple Poisson models H(t)
>> would be the expected number of events by time t and S(t) the probability of
>> "no events before time t".  G. Berry (Biometrics 1983) has a classic ane
>> readable article on this (especially if you ignore the proofs).
>>
>>   Using your example:
>>
>>> cphfit <-
>>> coxph(Surv(start,stop,event)~rx+number+size+cluster(id),data=bladder2)
>>> zz <- predict(cphfit, type='expected')
>>> c(sum(zz), sum(bladder2$event))
>> [1] 112 112
>>
>>> tdata <- bladder2[1:10]   #new data set (lazy way)
>>> predict(cphfit, type='expected', newdata=tdata)
>>  [1] 0.0324089 0.3226540 0.4213402 1.0560768 0.6702130 0.2163531 0.6490665
>>  [8] 0.8864808 0.2932915 0.5190647
>>
>>
>>  You can also do this using survexp and the cohort=FALSE argument, which
>> would return S(t) for each subject and we would then use -log(result) to get
>> H.  This is how it was done when I wrote the book, but the newer predict
>> function is easier.
>>
>> Terry Therneau
>>
>>