Andrew Ziem AZiem at us.ci.org
Thu Feb 17 20:23:19 CET 2011

After ctree builds a tree, how would I determine the direction missing values follow by examining the BinaryTree-class object?  For instance in the example below Bare.nuclei has 16 missing values and is used for the first split, but the missing values are not listed in either set of factors.   (I have the same question for missing values among numeric [non-factor] values, but I assume the answer is similar.)

> require(party)
> require(mlbench)
> data(BreastCancer)
> BreastCancer$Id <- NULL
> ct <- ctree(Class ~ . , data=BreastCancer, controls = ctree_control(maxdepth = 1))
> ct

         Conditional inference tree with 2 terminal nodes

Response:  Class 
Inputs:  Cl.thickness, Cell.size, Cell.shape, Marg.adhesion, Epith.c.size, Bare.nuclei, Bl.cromatin, Normal.nucleoli, Mitoses 
Number of observations:  699 

1) Bare.nuclei == {1, 2}; criterion = 1, statistic = 488.294
  2)*  weights = 448 
1) Bare.nuclei == {3, 4, 5, 6, 7, 8, 9, 10}
  3)*  weights = 251 
> sum(is.na(BreastCancer$Bare.nuclei))
[1] 16
> nodes(ct, 1)[[1]]$psplit
Bare.nuclei == {1, 2}
> nodes(ct, 1)[[1]]$ssplit

Based on below, the answer is node 2, but I don't see it in the object.

> sum(BreastCancer$Bare.nuclei %in% c(1,2,NA))
[1] 448
> sum(BreastCancer$Bare.nuclei %in% c(1,2))
[1] 432
> sum(BreastCancer$Bare.nuclei %in% c(3:10))
[1] 251


