[R] bug in rpart?
ligges at statistik.tu-dortmund.de
Fri May 22 19:43:57 CEST 2009
> I checked the Indian diabetes data again and get one tree for the data with
> reordered columns and another tree for the original data. I compared these
> two trees, the split points for these two trees are exactly the same but the
> fitted classes are not the same for some cases. And the misclassification
> errors are different too. I know how CART deal with ties --- even we are
> using the same data, the subjects to the left and right would not be the
> same if we just rearrange the order of covariates.
> But the problem is, the fitted trees are exactly the same on the split
> points. Shouldn't we get the same fitted values if the decisions are the
> same at each step? Why the same structured trees have different observations
> on the nodes?
Because they may use different surrogate variables. Note that your data
contain missing values that are handled by surrogates.
> The source code for running the diabetes data example and the output of
> trees are attached. Your professional opinion is very much appreciated.
> fit2<-rpart(diabetes~., data=mydata,method="class")
> plot(fit2,uniform=T,main="CART for original data")
> ## misclassifcation table: rows are fitted class
> neg pos
> neg 437 68
> pos 63 200
> fit3<-rpart(diabetes~., data=pmydata,method="class")
> plot(fit3,uniform=T,main="CART after exchaging mass & glucose")
> ##after exchage the order of BODY mass and PLASMA glucose
> neg pos
> neg 436 64
> pos 64 204
> R-help at r-project.org mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help