[R] help with RPART

Terry Therneau therneau at mayo.edu
Mon Jun 2 17:30:59 CEST 2008

```  When using anova method, all of the printed results are scaled by the RSS for
the top node.  Therefore the relative error measures for the trees already are
1-R^2.

tfit <- rpart(time ~ ., lung)
summary(tfit)

CP nsplit rel error   xerror      xstd
1 0.03665178      0 1.0000000 1.010097 0.1136942
2 0.03310179      1 0.9633482 1.079216 0.1172675
3 0.03029365      2 0.9302464 1.109587 0.1173583
4 0.01963453      3 0.8999528 1.249586 0.1327888
5 0.01627146     11 0.7396726 1.238411 0.1310952
6 0.01507635     12 0.7234012 1.260919 0.1337384
7 0.01031566     13 0.7083248 1.282740 0.1399397
8 0.01000000     14 0.6980091 1.296213 0.1396711

Node number 1: 228 observations,    complexity param=0.03665178
mean=305.2325, MSE=44176.93
left son=2 (81 obs) right son=3 (147 obs)
Primary splits:
pat.karno < 75    to the left,  improve=0.03661157, (3 missing)
ph.ecog   < 1.5   to the right, improve=0.03620793, (1 missing)
status    < 1.5   to the right, improve=0.02930372, (0 missing)
ph.karno  < 85    to the left,  improve=0.02058114, (1 missing)
sex       < 1.5   to the left,  improve=0.01679999, (0 missing)
Surrogate splits:
ph.ecog  < 1.5   to the right, agree=0.787, adj=0.392, (3 split)
ph.karno < 75    to the left,  agree=0.751, adj=0.291, (0 split)
age      < 72.5  to the right, agree=0.680, adj=0.089, (0 split)

Node number 2: 81 observations,    complexity param=0.03310179
mean=251.0247, MSE=34100.99
left son=4 (59 obs) right son=5 (22 obs)
Primary splits:
wt.loss < 21    to the left,  improve=0.12735970, (7 missing)
status  < 1.5   to the right, improve=0.08060663, (0 missing)
age     < 68.5  to the right, improve=0.04906869, (0 missing)
inst    < 2.5   to the left,  improve=0.04148716, (0 missing)
sex     < 1.5   to the left,  improve=0.02401074, (0 missing)
Surrogate splits:
ph.karno < 55    to the right, agree=0.743, adj=0.095, (6 split)

etc,

The first split has R^2 = .0367 = 1-overall fit (top few lines) = the
improvement measure for the node.

The second split has R^2 = .127 for the obs within that node, it improve the
R^2 for the model as a whole by .033.

Terry T.

```