[R] Estimating error rate for a classification tree

Tue Jan 25 03:22:03 CET 2005

Hi,

I created an rpart object and pruned the tree using
1-SE rule. I used 10-fold cross validation while
creating the tree. Then, I extracted the
cross-validated predictions for my data points using
xpred.rpart and obtained some statistics like
precision, recall, overall error rate, etc.

However, these values change each time I run
xpred.rpart because of the random shuffling going on
before cross validation (I think so). What should I do
in this case? I am inclined to treat them as random
variables with normal distribution. So, when I have,
say 100 runs, i can say something about the mean with
some confidence interval.

However, I also doubt that these subsequent runs may
not be independent from each other. I would highly
appreciate if someone could make a suggestion.

Best regards