[R] survival analysis: interval censored data

David Winsemius dwinsemius at comcast.net
Wed Sep 28 22:33:40 CEST 2011


On Sep 28, 2011, at 10:56 AM, Ruth Arias wrote:

>
>
> hallo terry:
>
> I attached araceae data set,

The usual survival analysis via the Kaplan-Meier method only make  
estimates at the time of events. When you tabulate your data, you see  
that there were no events for the missing (starting) "time" rows in  
those categories during the intervals that you are questioning as  
missing:

xtabs( ~ time+time2+categoria+event, data=araceae)
, , categoria = C, event = 0

       time2
time   2005 2006 2007 2008 2009 2010
   2004    0   23    1    3    1   22
   2005    0    0    0    0    0    0
   2007    0    0    0    0    4   19
   2008    0    0    0    0    0    0
   2009    0    0    0    0    0    0

, , categoria = E, event = 0

       time2
time   2005 2006 2007 2008 2009 2010
   2004    0   22    0    7    3   21
   2005    0    0    1    1    0    0
   2007    0    0    0    0    0   29
   2008    0    0    0    0    0    0
   2009    0    0    0    0    0    1


, , categoria = C, event = 1

       time2
time   2005 2006 2007 2008 2009 2010
   2004    0    5    2    3    0    3
   2005    0    0    0    0    0    0
   2007    0    0    0    2    3    2
   2008    0    0    0    0    1    0
   2009    0    0    0    0    0    0

, , categoria = E, event = 1

       time2
time   2005 2006 2007 2008 2009 2010
   2004    7    2    1    1    3    4
   2005    0    0    0    1    0    0
   2007    0    0    0    3    1    3
   2008    0    0    0    0    0    0
   2009    0    0    0    0    0    0

>
> when I use this:
>
> surara<-survfit(Surv(time,time2,event)~categoria)
>
> Call: survfit(formula = Surv(time, time2, event) ~ categoria)
>
>             records n.max n.start events median 0.95LCL 0.95UCL
> categoria=C      94    63       0     21     NA      NA      NA
> categoria=E     111    77       0     26     NA      NA      NA
>> summary(surara)
> Call: survfit(formula = Surv(time, time2, event) ~ categoria)
>
>                 categoria=C
>  time n.risk n.event entered censored survival std.err lower 95% CI  
> upper 95% CI
>  2006     63       5       0       23    0.921  0.0341         
> 0.856        0.990
>  2007     35       2      30        1    0.868  0.0483         
> 0.778        0.968
>  2008     62       5       1        3    0.798  0.0536         
> 0.700        0.910
>  2009     55       4       0        5    0.740  0.0570         
> 0.636        0.861
>  2010     46       5       0       41    0.660  0.0611         
> 0.550        0.791
>
>                 categoria=E
>  time n.risk n.event entered censored survival std.err lower 95% CI  
> upper 95% CI
>  2005     71       7       3        0    0.901  0.0354         
> 0.835        0.973
>  2006     67       2       0       22    0.875  0.0391         
> 0.801        0.955
>  2007     43       1      36        1    0.854  0.0432         
> 0.774        0.943
>  2008     77       5       0        8    0.799  0.0469         
> 0.712        0.896
>  2009     64       4       1        3    0.749  0.0502         
> 0.657        0.854
>  2010     58       7       0       51    0.658  0.0545         
> 0.560        0.774

You see that your first survfit object is offering a simple sum of  
'time2' columns of that tabulation as its 'n.event' values. It's  
'n.risk' tabulation is not taking note of whether a case started in  
any particular prior interval. The n.risk sum appears to be the sum of  
persons surviving from the prior year less any decedents plus any  
entrants as reflected in "future" events on that row    You notice  
that there are missing years even in that report: 2004,2005 for  
category C and 2004 for category E since there are no events in  
columns for those 'time2' values.

>
> but whe I included type=interval,
>
>> suraraint<- 
>> survfit(Surv(time,time2,event,type='interval')~categoria) # falta  
>> arreglar lo del intervalo!!!
>> summary(suraraint)
> Call: survfit(formula = Surv(time, time2, event, type = "interval") ~
>     categoria)
>
>                 categoria=C
>  time n.risk n.event survival std.err lower 95% CI upper 95% CI
>  2004  95.00   13.14    0.862  0.0354        0.795        0.934
>  2007  31.86    7.19    0.667  0.0695        0.544        0.818
>  2008   1.67    1.67    0.000     NaN           NA           NA
>
>                 categoria=E
>  time n.risk n.event survival std.err lower 95% CI upper 95% CI
>  2004  112.0   18.47    0.835  0.0351        0.769        0.907
>  2005   40.5    1.06    0.813  0.0401        0.738        0.896
>  2007   37.5    7.46    0.651  0.0620        0.540        0.785


The second object's n.event, when Surv() was constructed with  
type="interval", has values based on the starting 'time' rows,  but I  
am unable to deduce the estimating algorithm. I remember Therneau  
saying it wasn't a simple algorithm. The 2008 row in category C has  
one  entry of 1 in the next year and there were no censoring for C- 
entrants in that year. Why the n.event is 1.67 I cannot say, but at  
least the n.event does not exceed the n.risk.  The code or a copy of  
Therneau and Grambsch would be sensible places to look for answer by  
my initial efforts in those direction have not illuminated me.

-- 
David.

>
> it does not survival calculed for very year
>
> I have a one-year interval between each census
>
>
>
> ________________________________
> De: Terry Therneau <therneau at mayo.edu>
> Para: Ruth Arias <rueu_ac at yahoo.es>
> CC: r-help at r-project.org
> Enviado: miércoles 28 de septiembre de 2011 16:00
> Asunto: Re:  survival analysis: interval censored data
>
> You have still not given me enough information to reproduce your
> problem.  "Why doesn't it include all years?"  I have no way of  
> knowing,
> since we have no data.
>
> --- begin included message --
> halo david
>
> when I use type= 'interval'
>
> Call: survfit(formula = Surv(ingreso, fecha, estado, type =  
> "interval")
> ~
>     categoria)
>
> and when I use just
>
> Call: survfit(formula = Surv(ingreso, fecha, estado) ~ categoria)
>
> I don t know why when I use type = "interval" it does not survival
> calculed for very year
>
>
> regards<araceae.txt>______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list