[R] Kaplan-Meier plotting quirks

Michael Rentz rent0009 at umn.edu
Fri Oct 19 17:35:16 CEST 2012

Terry: Thank you, that makes quite a bit of sense. In transposing the data 
to intervals (corresponding to trapping runs) it becomes quite clear that 
tag loss is very high up front, and has very good survival after the 
initial period. This is what I needed to know. I think I had a case of too 
many trees to see the forest. I did not set up exactly as you suggest, but 
since I do not expect there to be any interval effect (just number of 
intervals) I think I am OK.

Thank you again, and glad to see another Minnesotan here.


On Oct 18 2012, Terry Therneau wrote:

 Better would be to use interval censored data. Create your data set so 
that you have
 (time1, time2) pairs, each of which describes the interval of time over 
which the tag was
 lost. So an animal first captured at time 10 sans tag would be (0,10); 
with tag at 5 and
 without at 20 would be (5,20), and last seen with tag at 30 would be (30, 
 Then survit(Surv(time1, time2, type='interval2') ~ 1, data=yourdata) will 
give a curve
>that accounts for interval censoring.
    As a prior poster suggested, if the times are very sparse then you may 
be better off
 assuming a smooth curve. Use the survreg function with the same equation 
as above; see
 help("predict.survreg") for an example of how to draw the resulting 
survival curve.
>Terry Therneau
>On 10/18/2012 05:00 AM, r-help-request at r-project.org wrote:
>> -----Original Message-----
>> From: Michael Rentz [mailto:rent0009 at umn.edu]
>> Sent: Tuesday, October 16, 2012 12:36 PM
>> To:r-help at r-project.org
>> Subject: [R] R Kaplan-Meier plotting quirks?
   Hello. I apologize in advance for the VERY lengthy e-mail. I endeavor to 
include enough detail.
   I have a question about survival curves I have been battling off and on 
for a few months. No one local seems to be able to help, so I turn here. 
The issue seems to either be how R calculates Kaplan-Meier Plots, or 
something with the underlying statistic itself that I am misunderstanding. 
Basically, longer survival times are yielding steeper drops in survival 
than a set of shorter survival times but with the same number of loss and 
retention events.
   As a minor part of my research I have been comparing tag survival in 
marked wild rodents. I am comparing a standard ear tag with a relatively 
new technique. The newer tag clearly ?wins? using survival tests, but the 
resultant Kaplan-Meier plot does not seem to make sense. Since I am dealing 
with a wild animal and only trapped a few days out of a month the data is 
fairly messy, with gaps in capture history that require assumptions of tag 
survival. An animal that is tagged and recaptured 2 days later with a tag 
and 30 days later without one could have an assumed tag retention of 2 days 
(minimum confirmed) or 30 days (maximum possible).
   Both are significant with a survtest, but the K-M plots differ. A plot 
of minimum confirmed (overall harsher data, lots of 0 days and 1 or 2 days) 
yields a curve with a steep initial drop in ?survival?, but then a leveling 
off and straight line thereafter at about 80% survival. Plotting the 
maximum possible dates (same number of losses/retention, but retention 
times are longer, the length to the next capture without a tag, typically
   25-30 days or more) does not show as steep of a drop in the first few 
days, but at about the point the minimum estimate levels off this one 
begins dropping steeply. 400 days out the plot with minimum possible 
estimates has tag survival of about 80%, whereas the plot with the same 
loss rate but longer assumed survival times shows only a 20% assumed 
survival at 400 days. Complicating this of course is the fact that the 
great majority of the animals die before the tag is lost, survival of the 
rodents is on the order of months.
   I really am not sure what is going on, unless somehow the high number of 
events in the first few days followed by few events thereafter leads to the 
assumption that after the initial few days survival of the tag is high. The 
plotting of maximum lengths has a more even distribution of events, rather 
than a clumping in the first few days, so I guess the model assumes 
relatively constant hazards? As an aside, a plot of the mean between the 
minimum and maximum almost mirrors the maximum plot. Adding five days to 
the minimum when the minimum plus 5 is less than the maximum returns a plot 
with a steeper initial drop, but then constant thereafter, mimicking the 
minimum plot, but at a lower final survival rate.
   Basically, I am at a loss why surviving longer would*decrease* the 
survival rate???
   My co-author wants to drop the K-M graph given the confusion, but I 
think it would be odd to publish a survival paper without one. I am not 
sure which graph to use? They say very different things, while the actual 
statistics do not differ that greatly.
   I am more than happy to provide the data and code for anyone who would 
like to help if the above is not explanation enough. Thank you in advance.
>> Mike.
>> --
>> Michael S. Rentz
>> PhD Candidate, Conservation Biology
>> University of Minnesota
>> 5122 Idlewild Street
>> Duluth, MN 55804
>> (218) 525-3299
>> rent0009 at umn.edu

Michael S. Rentz
PhD Candidate, Conservation Biology
University of Minnesota
5122 Idlewild Street
Duluth, MN 55804
(218) 525-3299
rent0009 at umn.edu

More information about the R-help mailing list