[R] survival analysis using rpart

Walter345 walter345 at yahoo.com
Mon Feb 26 18:37:56 CET 2007


Hello,

I use rpart to predict survival time and have a problem in interpreting the
output of “estimated rate”. Here is an example of what I do:

> stagec <-
> read.table("http://www.stanford.edu/class/stats202/DATA/stagec.data", 
> col.names=c("pgtime", "pgstat", "age","eet", "g2", "grade", "gleason",
> "ploidy"))

> fit <- rpart(Surv(pgtime, pgstat) ~ age + eet + g2 + grade + gleason +
> ploidy, data=stagec)


Result:

1) root 146 195.411600 1.0000000  
   2) grade< 2.5 61  45.021520 0.3624701  
     4) g2< 11.36 33   9.120116 0.1225562 *
     5) g2>=11.36 28  27.804100 0.7335298  
      10) gleason< 5.5 20  14.376900 0.5292190 *
      11) gleason>=5.5 8  11.201470 1.3083680 *
   3) grade>=2.5 85 125.327400 1.6190620  
     6) age>=56.5 75 104.154700 1.4287310  
      12) gleason< 7.5 50  66.701410 1.1431320 *
      13) gleason>=7.5 25  33.993130 2.0355220  
        26) g2>=15.29 13  16.555970 1.3494740 *
        27) g2< 15.29 12  14.220260 2.9210480 *
     7) age< 56.5 10  15.522810 3.1977430 *

Let’s look at the terminal node 4:

#	PGTIME	PGSTAGE	AGE	EET	G2	GRADE	GLEASON	PLOIDY
1	8.657084	0	70	1	4.43	1	3	1
2	16.70088	0	56	2	5.29	1	3	1
3	3.162217	1	62	2	3.57	2	4	1
4	10.20123	0	63	2	5.14	2	5	1
5	4.479124	0	63	2	5.75	2	5	1
6	6.516084	0	66	2	5.92	2	5	1
7	4.936345	0	67	2	6.41	2	5	1
8	10.79808	0	72	1	6.68	2	NA	1
9	9.174537	0	62	1	6.74	2	5	1
10	10.87474	0	72	2	6.8	2	5	1
11	7.028062	0	52	2	7.15	2	7	1
12	11.36481	0	59	2	7.61	2	5	1
13	10.17659	0	64	1	7.61	2	NA	1
14	6.96783	0	67	2	7.78	2	6	1
15	10.61738	0	55	2	7.81	2	5	1
16	6.510609	0	70	1	7.88	2	6	1
17	10.36276	0	55	2	8.1	2	5	1
18	6.694045	0	54	2	8.11	2	4	1
19	11.718	0	61	2	8.4	2	5	1
20	7.301847	0	69	2	8.46	2	5	1
21	6.067077	0	69	2	8.58	2	6	1
22	8.353182	0	59	2	8.76	2	6	1
23	5.541409	0	59	1	9.01	2	5	1
24	5.492128	0	61	2	9.42	2	5	1
25	7.208761	0	63	1	9.76	2	5	1
26	6.004106	0	52	2	9.9	2	4	1
27	5.664613	0	71	1	10.16	2	6	1
28	6.130047	0	64	2	10.26	2	4	1
29	9.812457	0	64	1	10.51	2	5	1
30	6.275154	0	62	2	10.82	2	6	1
31	9.253935	0	61	2	11.23	2	5	1
32	5.201916	0	54	2	11.35	2	6	1
33	6.22861	0	65	2	11.35	2	5	1

Here we have 33 observations and 1 event. The “estimated rate” is 0.1225562.
My questions are:

(1) Is the “estimated rate” the estimated hazard rate ratio? 
(2) How does rpart calculate this rate?
(3) Suppose I use xpred.rpart(fit, xval=10) to perform 10-fold
cross-validation using (a) the complete stagec data set and (b) only a
subset of it, say, using the columns Age, EET, and G2 only. For the i-th
patient, I am likely to obtain a different estimated rate. How can I
meaningfully compare both rates? How can say which one is “better”? 

Thanks a lot for all comments!
Walter





-- 
View this message in context: http://www.nabble.com/survival-analysis-using-rpart-tf3294276.html#a9163329
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list