[R] cross-validation in rpart

Prof Brian Ripley ripley at stats.ox.ac.uk
Sat Mar 19 16:08:48 CET 2011


On Sat, 19 Mar 2011, Penny B wrote:

> I am trying to find out what type of sampling scheme is used to select the 10
> subsets in 10-fold cross-validation process used in rpart to choose the best
> tree. Is it simple random sampling? Is there any documentation available on
> this?

Not SRS (and least in its conventional meaning), as it is 
partitioning: the 10 folds are disjoint.

Note that this happens in two places, in rpart() and in xpred.rpart(), 
but the (default) method is the same.  I presume you asked about the 
first, but it wasn't clear.

There is a lot of documentation on the meaning of '10-fold 
cross-validation', e.g. in my 1996 book.  There are a few slightly 
different ways to do it, and you can read the rpart sources if you 
want to know the details.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list