[R] how is xerror calculated in rpart?

Fri Apr 30 19:55:41 CEST 2010

* On Thu 05:53PM -0700, 29 Apr 2010, Seth (sjmyers at syr.edu) wrote:
>
> Hi,
>
> I've searched online, in a few books, and in the archives, but haven't seen
> this.  I believe that xerror is scaled to rel error on the first split.
> After fitting an rpart object, is it possible with a little math to
> determine the percentage of true classifications represented by a xerror
> value?  -seth

xerror is computed using a 10-fold cross-validation (see help(rpart.control)).

If your misclassification costs are uniform, an xerror value of 0.9 means that
the misclassification rate is 0.9 times the misclassification rate of the
trivial tree with no splits.  It should be easy to calculate the rate of the
trivial tree, because it assigns all cases to the same class -- the class that
minimizes the rate.

In general, xerror is computed from the misclassification *risk*, which takes
into account the loss matrix.

This paper goes into some detail about rpart:

@Article{         therneau.atkinson97,
  author        = {Therneau, T.M. and Atkinson, E.J.},
  title         = {An Introduction to Recursive Partitioning Using the
                  {RPART} Routines},
  journal       = {Mayo Clinic Technical Reports},
  year          = {1997},
  url           = {http://mayoresearch.mayo.edu/mayo/research/biostat/upload/61.pdf}

}

--
Best,
Hsiu-Khuern.