[R] R: How to prune using holdout data

Alfredo alfredo.roccato at fastwebnet.it
Mon Feb 27 16:48:59 CET 2017


Thank you, Terry, for your answer. 

I'll try to explain better my question. When you create a classification or
regression tree you first grow a tree based on a splitting criteria: this
usually results in a large tree that provides a good fit to the training
data. The problem with this tree is its potential for overfitting the data:
the tree can be tailored too specifically to the training data and not
generalize well to new data. The solution (apart cross-validation) is to
find a smaller subtree that results in a low error rate on holdout or
validation data.

Hope it helps to clarity my question.

Best,

Alfredo

 

 

-----Messaggio originale-----
Da: Therneau, Terry M., Ph.D. [mailto:therneau at mayo.edu] 

You will need to give more detail of exactly what you mean by "prune using a
validation set".  THe prune.rpart function will prune at any value you want,
what I suspect you are looking for is to compute the error of each possible
tree, using a validation data set, then find the best one, and then prune
there.

How do you define "best"?


	[[alternative HTML version deleted]]



More information about the R-help mailing list