[R] R Kaplan-Meier plotting quirks?
rent0009 at umn.edu
Tue Oct 16 18:36:06 CEST 2012
Hello. I apologize in advance for the VERY lengthy e-mail. I endeavor to
include enough detail.
I have a question about survival curves I have been battling off and on for
a few months. No one local seems to be able to help, so I turn here. The
issue seems to either be how R calculates Kaplan-Meier Plots, or something
with the underlying statistic itself that I am misunderstanding. Basically,
longer survival times are yielding steeper drops in survival than a set of
shorter survival times but with the same number of loss and retention
As a minor part of my research I have been comparing tag survival in marked
wild rodents. I am comparing a standard ear tag with a relatively new
technique. The newer tag clearly “wins” using survival tests, but the
resultant Kaplan-Meier plot does not seem to make sense. Since I am dealing
with a wild animal and only trapped a few days out of a month the data is
fairly messy, with gaps in capture history that require assumptions of tag
survival. An animal that is tagged and recaptured 2 days later with a tag
and 30 days later without one could have an assumed tag retention of 2 days
(minimum confirmed) or 30 days (maximum possible).
Both are significant with a survtest, but the K-M plots differ. A plot of
minimum confirmed (overall harsher data, lots of 0 days and 1 or 2 days)
yields a curve with a steep initial drop in “survival”, but then a
leveling off and straight line thereafter at about 80% survival. Plotting
the maximum possible dates (same number of losses/retention, but retention
times are longer, the length to the next capture without a tag, typically
25-30 days or more) does not show as steep of a drop in the first few days,
but at about the point the minimum estimate levels off this one begins
dropping steeply. 400 days out the plot with minimum possible estimates has
tag survival of about 80%, whereas the plot with the same loss rate but
longer assumed survival times shows only a 20% assumed survival at 400
days. Complicating this of course is the fact that the great majority of
the animals die before the tag is lost, survival of the rodents is on the
order of months.
I really am not sure what is going on, unless somehow the high number of
events in the first few days followed by few events thereafter leads to the
assumption that after the initial few days survival of the tag is high. The
plotting of maximum lengths has a more even distribution of events, rather
than a clumping in the first few days, so I guess the model assumes
relatively constant hazards? As an aside, a plot of the mean between the
minimum and maximum almost mirrors the maximum plot. Adding five days to
the minimum when the minimum plus 5 is less than the maximum returns a plot
with a steeper initial drop, but then constant thereafter, mimicking the
minimum plot, but at a lower final survival rate.
Basically, I am at a loss why surviving longer would *decrease* the
My co-author wants to drop the K-M graph given the confusion, but I think
it would be odd to publish a survival paper without one. I am not sure
which graph to use? They say very different things, while the actual
statistics do not differ that greatly.
I am more than happy to provide the data and code for anyone who would like
to help if the above is not explanation enough. Thank you in advance.
Michael S. Rentz
PhD Candidate, Conservation Biology
University of Minnesota
5122 Idlewild Street
Duluth, MN 55804
rent0009 at umn.edu
More information about the R-help