[R] method of rpart when response variable is binary?

ronggui ronggui.huang at gmail.com
Fri Jun 15 15:27:35 CEST 2007


Dear all,

I would like to model the relationship between y and x. y is binary
variable, and x is a count variable which may be possion-distribution.

I think it is better to divide x into intervals and change it to a
factor before calling glm(y~x,data=dat,family=binomail).

I try to use rpart. As y is binary, I use "class" method and get the
following result.
> rpart(y~x,data=dat,method="class")
n=778 (22 observations deleted due to missingness)

node), split, n, loss, yval, (yprob)
      * denotes terminal node

1) root 778 67 0 (0.91388175 0.08611825) *


If with the default method, I get such a result.

> rpart(y~x,data=dat)
n=778 (22 observations deleted due to missingness)

node), split, n, deviance, yval
      * denotes terminal node

1) root 778 61.230080 0.08611825
  2) x< 19.5 750 53.514670 0.07733333
    4) x< 1.25 390 17.169230 0.04615385 *
    5) x>=1.25 360 35.555560 0.11111110 *
  3) x>=19.5 28  6.107143 0.32142860 *

If I use 1.25 and 19.5 as the cutting points, change x into factor by
>x2 <- cut(q34b,breaks=c(0,1.25,19.5,200),right=F)

The coef in y~x2 is significant and makes sense.

My problem is: is it OK use the default method in rpart when response
varibale is binary one?  Thanks.


-- 
Ronggui Huang
Department of Sociology
Fudan University, Shanghai, China



More information about the R-help mailing list