[R] Recursive partitioning algorithms in R vs. alia

Terry Therneau therneau at mayo.edu
Tue Jun 23 15:21:37 CEST 2009


A point of history:

  Both the commercial CART program and the rpart() function are based on the 
book Classification and Regression Trees (Breiman, Friedman, Olshen, Stone, 
1984).  As a reader/commentator on one of the early drafts I got to know the 
material well.  CART started as a large Fortran program written by Jerry 
Friedman which was the testing ground for the ideas in the book.  I had the code 
at one time and made some modifications to it, but found it too frustrating to 
go very far with. Fortran is just too clumsy for a recursive task, and Jerry's 
ability to hold upteen variables in his head at once greater than mine -- the 
Fortran was a large monlithic block.  Salford Systems aquired rights to that 
code; I don't know whether any of the original lines remain in their product.  I 
had lots of conversations with their main programmer (15-20 years ago now) about 
methods for speeding it up; mainly an interesting problem in optimal indexing.
    
   When rpart was first written it's output agreed with CART almost entirely.  
The only major difference was in surrogates: I pick the surrogate with the 
largest number of agreements, CART picked that with the greatest % agreement.  
This means that rpart favors variables with fewer missing values.  Since that 
point in time both codes have evolved.  I haven't had time to do important work 
on rpart in over a decade.  It' not surprising that the graphics and display are 
behind the curve, what's more surprising is that it still endures.
   
   Rpart is called "rpart" because the authors copyrighted the term "CART" for 
their program.  It was the best alternative name that I could come up with at 
the time.  I find it amusing that one consequence of their copyright choice is 
that I now see "recursive partitioning"  far more often than "CART" as the 
generic label for tree based methods.
   
   Terry T




More information about the R-help mailing list