[R] Using coxph with Gompertz-distributed survival data.

Fri Feb 5 01:31:31 CET 2010

Dear list:

I am attempting to use what I thought would be a pretty straightforward practical application of Cox regression. I figure users of the survival package must have come across this problem before, so I would like to ask you how you dealt with it. I have set up an illustrative example and included it at the end of this post.

I took a sample of 100 data points from each of two populations ('a' and 'b') known to have Gompertz distributions that differ in both their lambda (initial rate, which is higher in population 'a') and gamma (acceleration rate, which is higher in population 'b') parameters. Each data point represents the age at which an experimental animal died.

First question: given the way the parameters differ between the Gompertz distributions, the mortality rate will be higher for 'a' younger ages, higher for 'b' at older ages, and the assumption of the Cox Proportional Hazards model is violated a priori, isn't it? cox.zph confirms this. Yet I found plenty of Gompertz parameter values that differ, and lead to differences in survival times detectable by coxph, yet pass the cox.zph test. Should I assume that cox.zph is insufficiently sensitive and take measures to account for non-proportionality of hazards anytime I know I'm looking at longevity data (which is believed to have a Gompertz distribution for mammals dying from 'old age')?

Second question: supposedly the way to handle such data with coxph is to either stratify it by age or include an interaction term with age in the formula. The first option is not optimal, because the hazard for a Gompertz distribution changes continuously and there are no natural intervals within which the hazard ratio is relatively constant. Therefore, I would like to use the second option, but including the age term in the formula produces an error that says "Error in fitter(X, Y, strats, offset, init, control, weights = weights,  :  NA/NaN/Inf in foreign function call (arg 6)". My question is, what would be the correct way to include age in the formula?

If you have read this far, thank you kindly for your time. Below is code for reproducing the example I explained above.

# create the data
a<-c(1048, 773, 1072, 753, 888, 1038, 852, 950, 1335, 1227, 904, 634, 980, 1075, 1308, 787, 1079, 1151, 677, 1110, 363, 936, 774, 644, 1080, 1055, 1119, 975, 941, 1148, 1014, 541, 1297, 584, 847, 1136, 793, 1171, 985, 934, 852, 550, 944, 1165, 1190, 99, 1227, 1052, 1106, 888, 884, 555, 862, 1241, 841, 987, 1162, 1028, 1009, 1102, 1082, 1009, 1200, 902, 1215, 1121, 1177, 882, 766, 1366, 1190, 755, 958, 38, 925, 1131, 1031, 639, 704, 1097, 820, 570, 1029, 1205, 1284, 1139, 522, 1197, 1268, 1376, 842, 1310, 959, 1160, 824, 652, 1121, 1262, 1006, 1021);

b<-c(797, 897, 993, 1174, 998, 1117, 583, 698, 1125, 796, 1055, 953, 847, 892, 875, 1108, 1090, 1101, 1104, 991, 882, 1036, 889, 843, 1129, 1018, 936, 1035, 906, 1192, 1028, 1109, 790, 897, 544, 978, 1108, 1004, 730, 1093, 845, 1014, 1040, 1080, 921, 958, 971, 950, 1026, 882, 1055, 703, 901, 991, 863, 1043, 999, 784, 908, 987, 1040, 760, 887, 1028, 808, 1077, 812, 843, 1002, 639, 905, 808, 850, 1112, 736, 851, 1008, 990, 516, 1015, 942, 993, 1127, 959, 963, 1069, 940, 815, 926, 1005, 983, 1093, 898, 901, 1132, 1011, 808, 905, 1129, 840);

# format the data for coxph
age<-c(a,b); group<-as.factor(c('a','b'),c(100,100)));

# run coxph; for simplicity, the data is uncensored
ab.cox<-coxph(Surv(age)~group);

# the results indicate a singificant difference between the groups
ab.cox;

# however, it appears that the hazards are not proportional and the result is not interpretable
cox.zph(ab.cox);

# including age in an interaction term leads to an error
ab.int.cox<-coxph(Surv(age)~group+group:age);

# the following is like above; 'c' is from a population differs from 'a' in both parameters
# and coxph indicates a highly significant difference, but this time the data seems to not
# violate the proportional hazards assumption

c<-c(526, 858, 36, 814, 402, 902, 653, 779, 776, 971, 789, 824, 680, 1017, 957, 607, 1142, 1144, 888, 861, 868, 1052, 825, 654, 443, 1023, 727, 612, 1017, 888, 811, 820, 780, 846, 844, 272, 1083, 855, 903, 783, 1036, 1026, 462, 1007, 777, 1124, 1077, 1041, 900, 963, 332, 740, 1012, 1017, 705, 864, 1053, 643, 1195, 795, 1040, 792, 714, 971, 832, 558, 1107, 1038, 1149, 448, 605, 921, 772, 994, 1034, 949, 788, 1013, 788, 1038, 956, 850, 1045, 747, 967, 814, 661, 657, 730, 966, 676, 344, 743, 643, 1004, 766, 942, 1040, 1067, 409);

age2<-c(a,c); group2<-as.factor(rep(c('a','c'),c(100,100)));
ac.cox<-coxph(Surv(age2)~group2);
ac.cox;
cox.zph(ac.cox);