[R] Some questions on Rpart algorithm

Tue Oct 17 16:03:04 CEST 2006

Hello:
  I am using rpart and would like more background on how the splits are made
and how to interpret results - also how to properly use text(.rpart). I have
looked through Venables and Ripley and through the rpart help and still have
some questions. If there is a source (say, Breiman et al)  on decision trees
that would clear this all up,  please let me know. The questions below
pertain to a classification task (ie., I'm using the "class" method). Many
thanks in advance. 

(1)  I'd like text(.rpart) to print percentages of each class rather then
counts. I don't see an option for this so would like to modify the
text.rpart. However, I can't find the source since it is a method that's
"hidden". How can I find the source? 

(2) printcp prints a table with columns cp, nsplit, rel error, xerror, xstd.
I am guessing that cp is complexity, nsplit is the number of the split, rel
error is the error on test set, xerror is cross-validation error and xstd is
standard deviation of error across the cross-validation sets. Is there any
documentation on this? For instance, how exactly is complexity computed? 

(3)  What's a "loss matrix?" Is it the cost place on each type of
misclassification? 

(4) [More of a methodology question] In practice, when would one use
different costs on different splitting variables?

Thanks for any help on this.

  Jeff