[R] using survreg() in survival package with "long" data

Therneau, Terry M., Ph.D. therneau at mayo.edu
Mon Aug 31 15:56:34 CEST 2015


On 08/30/2015 05:00 AM, r-help-request at r-project.org wrote:
> I'm unable to fit a parametric survival regression using survreg() in the survival package with data in "counting-process" ("long") form.
>
> To illustrate using a scaled-down problem with 10 subjects (with data placed on the web):
>

As usual I'm a day late since I read digests, and Goran has already clarified things.  A 
discussion of this is badly needed in my as yet unwrritten book on using the survival 
package.  From a higher level view:
   If an observation is interval censored (a,b) then one knows that the event happened 
between time "a" and time "b", but not when.  The survreg routine can handle interval 
censored data since it is parametric (you need to integrate over the interval).  The 
interval (-infinity, b) is called 'left censored' and the interval (a, infinity) is 'right 
censored'.  Left censored data is rare in medical work, an example might be a chronic 
disease like rhuematoid arthritis where we know that the true disease onset was some time 
before the date it was first detected, and one is trying to deduce the duration of disease.

   Left truncation at time 'a' means that any events before time "a" are not in the data 
set.  In a referral center like mine this includes any subjects who die before they come 
to us.  The coxph model handles left truncation naturally via its counting process 
formulation.  That same formulation also allows it to deal with time dependent 
covariates.   Accelerated failure time models like survreg can handle left truncation in 
principle, but they require that the values of any covariates are known from time 0 -- 
even for a truncated subject.   I have never added left-truncation to the survreg code, 
mostly because I have never needed it myself, but also because users would immediately 
think that they could accomplish time-dependent covariates by simply using a long format 
data set. Rather, each subject needs to be linked to a full covariate history, which is a 
bit more work.

  So:  coxph does left truncation but not left (or interval) censoring
       survreg does interval censoring but not left truncation (or time dependent covariates).

Terry T



More information about the R-help mailing list