[R] ctree (party) plot meaning question
gavin.simpson at ucl.ac.uk
Mon Jun 30 23:06:23 CEST 2008
On Mon, 2008-06-30 at 10:41 -0700, Birgitle wrote:
> I tried to use ctree but am not sure about the meaning of the plot.
> My.data.ct<-ctree(Resp~., data=My.data)
> My data.frame contains 88 explanatory variables (continous,ordered/unordered
> multistate,count data) and one response with two groups.
> In the plot are only two variables shown (2 internal nodes) and 3 final
> nodes. Does it mean that only these two variables show a significant
> asssociation with the response?
> Many thanx in advance
Yes, very simply. Nodes are only split if a split has a p-value of less
than 1-mincriterion, where mincriterion is 0.95 by default, in a test of
independence between the response variable and the predictor.
Using an internal data set:
mod <- ctree(Species ~ . , data = iris)
The plot (on my machine) shows 3 internal nodes resulting in 4 leaves.
Petal length and petal width are the two selected variables. The Sepal
length and width variables are not selected.
Now what happens if we reduce mincriterion? (This is a silly example -
you wouldn't want to select a split with a p-value that high):
mod1 <- ctree(Species ~ . , data = iris,
control = ctree_control(mincriterion = 0.8))
Now we see that a further split on Petal width has been made, but notice
the p-value for this split.
So nodes are only split if the null hypothesis of independence between a
the response and the predictors cannot be rejected at the given level of
significance (1 - mincriterion).
This is a different approach to rpart/mvpart, where splitting is based
on a few simple stopping rules and then cross-validation is used to
prune the tree back.
You'd be best to read the cited references in ?ctree for more background
on these conditional inference trees.
> The art of living is more like wrestling than dancing.
> (Marcus Aurelius)
More information about the R-help