[R] R: How to prune using holdout data
alfredo.roccato at fastwebnet.it
Mon Feb 27 16:48:59 CET 2017
Thank you, Terry, for your answer.
I'll try to explain better my question. When you create a classification or
regression tree you first grow a tree based on a splitting criteria: this
usually results in a large tree that provides a good fit to the training
data. The problem with this tree is its potential for overfitting the data:
the tree can be tailored too specifically to the training data and not
generalize well to new data. The solution (apart cross-validation) is to
find a smaller subtree that results in a low error rate on holdout or
Hope it helps to clarity my question.
Da: Therneau, Terry M., Ph.D. [mailto:therneau at mayo.edu]
You will need to give more detail of exactly what you mean by "prune using a
validation set". THe prune.rpart function will prune at any value you want,
what I suspect you are looking for is to compute the error of each possible
tree, using a validation data set, then find the best one, and then prune
How do you define "best"?
[[alternative HTML version deleted]]
More information about the R-help