[R] rpart puzzle

White.Denis@epamail.epa.gov White.Denis at epamail.epa.gov
Thu Jul 12 20:22:50 CEST 2001


I haven't looked that carefully at rpart, but in tree the potential splits
are midpoints between actual data values.  So if x7 had values of 36 and
38, but not 37, a valid split would be < 37 and > 37.




                                                                                        
                    Marc Feldesman                                                      
                    <feldesmanm at pdx.edu>        To:     r-help at stat.math.ethz.ch        
                    Sent by:                    cc:     therneau at mayo.edu               
                    owner-r-help at stat.ma        Subject:     [R] rpart puzzle           
                    th.ethz.ch                                                          
                                                                                        
                                                                                        
                    07/12/2001 09:02                                                    
                                                                                        
                                                                                        




I've been using the package rpart with R 1.3.0 for Windows to produce
simple classification trees for some measurement data from paleontological
specimens.  Both the rpart documentation and the output confirm that the
program produces splits on continuous data that leave "holes" in the
data.  It is probably of little practical importance, but is there a reason

why the binary splits are constructed in the form (e.g):

x7 < 37
x7 > 37

as opposed to the actual CART (tm) methodology of:

x7 <= 37
x7 > 37

It seems to me that if one were to use rpart to classify an unknown case
where x7 = 37, the program wouldn't actually know which way to move the
case.

I've read through the rpart technical report, the rpart user's manual, the
rpart help file and see this practice illustrated, but don't find any
explanation for this minor (and probably trivial) departure from the
methodology illustrated in the CART program and in the Breiman et al book.






=====================
Dr. Marc R. Feldesman
Professor and Chairman
Anthropology Department
Portland State University
1721 SW Broadway
Portland, Oregon 97201
email:  feldesmanm at pdx.edu
phone:  503-725-3081
fax:    503-725-3905
http://web.pdx.edu/~h1mf
PGP Key Available On Request
======================

"Anyway, no drug, not even alcohol, causes the fundamental ills of society.
If we're looking for the source of our troubles, we shouldn't test people
for drugs, we should test them for stupidity, ignorance, greed and love of
power."   P.J. O'Rourke

Powered by Optiplochoerus and Windows 2000 (scary isn't it?)

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._._




-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list