[R] Tree question

Liaw, Andy andy_liaw at merck.com
Tue Jul 15 20:59:03 CEST 2003


That should be "in"dependent variable.  The CART book, in the sentence you
quoted, did not make this clear.  But the following paragraph clearly
indicate that they are talking about the predictor variables, not the
response.

This is because the tree algorithm doesn't work on the original predictor
variables, but rather just the ranks of their unique values.  The possible
splits are all the "gaps" between the unique values, so the algorithm only
need the ranks.  Ranks are clearly invariant to monotone transformation.

Andy

> -----Original Message-----
> From: Peter Flom [mailto:flom at ndri.org] 
> Sent: Tuesday, July 15, 2003 2:44 PM
> To: r-help at stat.math.ethz.ch
> Subject: [R] Tree question
> 
> 
> I was under the impression that the tree method (e.g. as 
> implemented in
> rpart) was insensitive to monotonic transformations of the 
> dependent variable.  e.g. Breiman Olshen et al. 
> Classification and Regression Trees  state "In a standard 
> data structure [a tree] is invariant under all monotone 
> transformations of individual ordered varaibles" (p. 57)
> 
> However, I get very different results from
> tr.hh.pri <- rpart((log(YPRISX+1)~AGE+DRUGUSEY+SEX+OBSXNUM))
> 
> and 
> 
> tr.hh.pri <- rpart(YPRISX~AGE+DRUGUSEY+SEX+OBSXNUM)
> 
> the former gives more splits and different splits.
> 
> Some notes:
> The DV is a count variable, and highly skew, with some 0s, 
> many 1s, and a long right tail out to 99. AGE ranges from 
> 18-25 DRUGUSEY is ordered (hardest drug used) and 
> OBSXNUM is also ordered (proportion of your friends who 
> object to your having 'casual sex')
> 
> printing the first tree gives
> 
>  1) root 307 23.472040 0.7114605  
>    2) AGE>=19.5 196 13.811070 0.6857971  
>      4) OBSXNUM< 2.5 69  5.712526 0.6338252  
>        8) DRUGUSEY>=1.5 15  2.261203 0.5161601 *
>        9) DRUGUSEY< 1.5 54  3.185960 0.6665100 *
>      5) OBSXNUM>=2.5 127  7.810911 0.7140339 *
>    3) AGE< 19.5 111  9.303947 0.7567761  
>      6) DRUGUSEY< 0.5 48  1.105266 0.6727132 *
>      7) DRUGUSEY>=0.5 63  7.601052 0.8208239  
>       14) SEX>=1.5 21  1.258395 0.7317629 *
>       15) SEX< 1.5 42  6.092803 0.8653544 *
> 
> 
> printing the second tree gives
> 
>  1) root 307 144.540700 1.1205210  
>    2) AGE>=19.5 196  68.382650 1.0561220 *
>    3) AGE< 19.5 111  73.909910 1.2342340  
>      6) DRUGUSEY< 0.5 48   2.979167 0.9791667 *
>      7) DRUGUSEY>=0.5 63  65.428570 1.4285710  
>       14) SEX>=1.5 21   6.571429 1.1428570 *
>       15) SEX< 1.5 42  56.285710 1.5714290 *
> 
> 
> So, is this the 'exception that proves the rule'? Have I done 
> something wrong?  Or what?
> 
> Any ideas or thoughts?
> 
> Thanks in advance
> 
> 
> Peter
> 
> Peter L. Flom, PhD
> Assistant Director, Statistics and Data Analysis Core
> Center for Drug Use and HIV Research
> National Development and Research Institutes
> 71 W. 23rd St
> www.peterflom.com
> New York, NY 10010
> (212) 845-4485 (voice)
> (917) 438-0894 (fax)
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list 
> https://www.stat.math.ethz.ch/mailman/listinfo> /r-help
> 

------------------------------------------------------------------------------
Notice: This e-mail message, together with any attachments, ...{{dropped}}




More information about the R-help mailing list