[R] recursive partitioning in R

Therneau, Terry M., Ph.D. therneau at mayo.edu
Thu Nov 12 13:44:37 CET 2015


Look at the rpart vignette "User written split functions".  The code allows you to add 
your own splitting method to the code (in R, no C required).  This has proven to be very 
useful for trying out new ideas.

The second piece would be to do your own cross-validation.  That is, turn off the built in 
cross-validation using the xval=0 option, then explicitly do the cross-validation 
yourself.  Fit a new tree to some chosen subset of data, using your split rule of course, 
and then use predict() to get predicted values for the remaining observations. Again, this 
is all in R, and you can explicitly control your in or out of bag subsets.
The xpred.rpart function may be useful to automate some of the steps.

If you look up rpart on CRAN, you will see a link to the package source.  If you were to 
read the C source code you will discover that 95% is boring bookkeeping of what 
observations are in what part(s) of the tree, sorting the data, tracking missing values, 
etc.  If you ever do want to write your own code you are more than welcome to build off 
this --- I wouldn't want to write that part again.

Terry Therneau

On 11/12/2015 05:00 AM, r-help-request at r-project.org wrote:
> Dear List,
>
> I'd like to make a few modifications to the typical CART algorithm, and
> I'd rather not code the whole thing from scratch.  Specifically I want
> to use different in-sample and out-of-sample fit criteria in the split
> choosing and cross-validation stages.
>
> I see however that the code for CART in both the rpart and the tree
> packages is written in C.
>
> Two questions:
>
>    * Where is the C code?  It might be possible to get a C-fluent
>      programmer to help me with this.
>    * Is there any code for CART that is written entirely in R?
>
> Thanks,
> Andrew



More information about the R-help mailing list