[R] missing values in party::ctree

Torsten Hothorn Torsten.Hothorn at stat.uni-muenchen.de
Fri Feb 18 09:07:45 CET 2011


On Thu, 17 Feb 2011, Andrew Ziem  wrote:

> After ctree builds a tree, how would I determine the direction missing values follow by examining the BinaryTree-class object?  For instance in the example below Bare.nuclei has 16 missing values and is used for the first split, but the missing values are not listed in either set of factors.   (I have the same question for missing values among numeric [non-factor] values, but I assume the answer is similar.)

Hi Andrew,

ctree() doesn't treat missings in factors as a category in its own right. 
Instead, it uses surrogate splits to determine the daughter node 
observations with missings in the primary split variable are send to (you 
need to specify `maxsurrogates' in ctree_control()).

However, you can recode your factor and add NA to the levels. This will
lead to the intended behaviour.

Best,

Torsten

>
>
>> require(party)
>> require(mlbench)
>> data(BreastCancer)
>> BreastCancer$Id <- NULL
>> ct <- ctree(Class ~ . , data=BreastCancer, controls = ctree_control(maxdepth = 1))
>> ct
>
>         Conditional inference tree with 2 terminal nodes
>
> Response:  Class
> Inputs:  Cl.thickness, Cell.size, Cell.shape, Marg.adhesion, Epith.c.size, Bare.nuclei, Bl.cromatin, Normal.nucleoli, Mitoses
> Number of observations:  699
>
> 1) Bare.nuclei == {1, 2}; criterion = 1, statistic = 488.294
>  2)*  weights = 448
> 1) Bare.nuclei == {3, 4, 5, 6, 7, 8, 9, 10}
>  3)*  weights = 251
>> sum(is.na(BreastCancer$Bare.nuclei))
> [1] 16
>> nodes(ct, 1)[[1]]$psplit
> Bare.nuclei == {1, 2}
>> nodes(ct, 1)[[1]]$ssplit
> list()
>
>
>
> Based on below, the answer is node 2, but I don't see it in the object.
>
>> sum(BreastCancer$Bare.nuclei %in% c(1,2,NA))
> [1] 448
>> sum(BreastCancer$Bare.nuclei %in% c(1,2))
> [1] 432
>> sum(BreastCancer$Bare.nuclei %in% c(3:10))
> [1] 251
>
>
> Andrew
>
>



More information about the R-help mailing list