[R] method of rpart when response variable is binary?
    ronggui 
    ronggui.huang at gmail.com
       
    Fri Jun 15 15:27:35 CEST 2007
    
    
  
Dear all,
I would like to model the relationship between y and x. y is binary
variable, and x is a count variable which may be possion-distribution.
I think it is better to divide x into intervals and change it to a
factor before calling glm(y~x,data=dat,family=binomail).
I try to use rpart. As y is binary, I use "class" method and get the
following result.
> rpart(y~x,data=dat,method="class")
n=778 (22 observations deleted due to missingness)
node), split, n, loss, yval, (yprob)
      * denotes terminal node
1) root 778 67 0 (0.91388175 0.08611825) *
If with the default method, I get such a result.
> rpart(y~x,data=dat)
n=778 (22 observations deleted due to missingness)
node), split, n, deviance, yval
      * denotes terminal node
1) root 778 61.230080 0.08611825
  2) x< 19.5 750 53.514670 0.07733333
    4) x< 1.25 390 17.169230 0.04615385 *
    5) x>=1.25 360 35.555560 0.11111110 *
  3) x>=19.5 28  6.107143 0.32142860 *
If I use 1.25 and 19.5 as the cutting points, change x into factor by
>x2 <- cut(q34b,breaks=c(0,1.25,19.5,200),right=F)
The coef in y~x2 is significant and makes sense.
My problem is: is it OK use the default method in rpart when response
varibale is binary one?  Thanks.
-- 
Ronggui Huang
Department of Sociology
Fudan University, Shanghai, China
    
    
More information about the R-help
mailing list