[R] Recursive partitioning algorithms in R vs. alia

jude.ryan at ubs.com jude.ryan at ubs.com
Tue Jun 23 16:31:27 CEST 2009


Thanks for your point of view Terry! It is always fascinating to follow
the history of the field, especially as told by someone involved with
it.

Jude Ryan

-----Original Message-----
From: Terry Therneau [mailto:therneau at mayo.edu] 
Sent: Tuesday, June 23, 2009 9:22 AM
To: Ryan, Jude; cgb at datanalytics.com
Cc: r-help at r-project.org
Subject: Re: [R] Recursive partitioning algorithms in R vs. alia

A point of history:

  Both the commercial CART program and the rpart() function are based on
the 
book Classification and Regression Trees (Breiman, Friedman, Olshen,
Stone, 
1984).  As a reader/commentator on one of the early drafts I got to know
the 
material well.  CART started as a large Fortran program written by Jerry

Friedman which was the testing ground for the ideas in the book.  I had
the code 
at one time and made some modifications to it, but found it too
frustrating to 
go very far with. Fortran is just too clumsy for a recursive task, and
Jerry's 
ability to hold upteen variables in his head at once greater than mine
-- the 
Fortran was a large monlithic block.  Salford Systems aquired rights to
that 
code; I don't know whether any of the original lines remain in their
product.  I 
had lots of conversations with their main programmer (15-20 years ago
now) about 
methods for speeding it up; mainly an interesting problem in optimal
indexing.
    
   When rpart was first written it's output agreed with CART almost
entirely.  
The only major difference was in surrogates: I pick the surrogate with
the 
largest number of agreements, CART picked that with the greatest %
agreement.  
This means that rpart favors variables with fewer missing values.  Since
that 
point in time both codes have evolved.  I haven't had time to do
important work 
on rpart in over a decade.  It' not surprising that the graphics and
display are 
behind the curve, what's more surprising is that it still endures.
   
   Rpart is called "rpart" because the authors copyrighted the term
"CART" for 
their program.  It was the best alternative name that I could come up
with at 
the time.  I find it amusing that one consequence of their copyright
choice is 
that I now see "recursive partitioning"  far more often than "CART" as
the 
generic label for tree based methods.
   
   Terry T
   

Please do not transmit orders or instructions regarding a UBS 
account electronically, including but not limited to e-mail, 
fax, text or instant messaging. The information provided in 
this e-mail or any attachments is not an official transaction 
confirmation or account statement. For your protection, do not 
include account numbers, Social Security numbers, credit card 
numbers, passwords or other non-public information in your e-mail. 
Because the information contained in this message may be privileged, 
confidential, proprietary or otherwise protected from disclosure, 
please notify us immediately by replying to this message and 
deleting it from your computer if you have received this 
communication in error. Thank you. 

UBS Financial Services Inc. 
UBS International Inc. 
UBS Financial Services Incorporated of Puerto Rico 
UBS AG\ \  \ UBS reserves the right to retain all messag...{{dropped:6}}




More information about the R-help mailing list