[R] tests for significance on conditional inference trees from party package
Achim.Zeileis at uibk.ac.at
Tue Dec 13 21:22:43 CET 2016
thanks for your interest.
On Tue, 13 Dec 2016, Adrian Johnson wrote:
> Dear group,
> Please allow me to ask a naive question and pardon if it is qualified
> as stupid question.
> I am using party package to classify covariates and predict distribution
> of survival times for the classified variables. Typically I have a
> matrix of covariates (columns) including outcome data (overall survival
> in months, censor status) and other covariates I want to split in tree
> (such as treatment dose etc. ) . Rows are patients (~1000 patients).
> Now similarly I have many such matrices (4K) with completely different
> set of covariates but identical outcome data and patients (in rows). i
> cannot combine all data into a giant matrix,because these covariates are
> totally independent.
If the response variable is the same and the patients are the same, then I
don't see why - conceptionally - you couldn't combine "totally
independent" variables in the same tree. Or maybe I misunderstand what
"totally independent" is.
Practically - however, choosing a tree from 4,000 regressor variables will
be challenging, especially if you want to adjust in some way for the
multiple testing. So maybe some additional structure would help here.
> Currently I am running this model in a loop and storing the tree and
> parsing the tree structure.
Parsing the tree structure is quite cumbersome in the old "party"
implementation. This was one of the main motivations to establish the
reimplementation in "partykit". This has a much better and more accessible
tree infrastructure. See the vignettes in the "partykit" package for more
details - especially vignette("partykit", package = "partykit") gives a
good overview of the building blocks.
Additionally, over at StackOverflow you can find various additional
bits and pieces that may be helpful. Look for the "party" tag.
Finally, there is also a partykit support forum on R-Forge.
> My question is, is there some testing method to choose or rank these 4K
> trees such that I can select each tree from top to bottom. I know each
> tree is important in its own way.
It is not clear to me what/how you want to rank the results. However,
looking at the sources of information listed above might take you a few
> If selection based on significance is required, then is there any other
> way instead of conditional inference tree , that partitions data but
> will also carry some significance to choose from.
The MOB (model-based recursive partitioning) algorithm is also based on
significance tests and implemented in the "partykit" package. It uses
parametric asymptotic inference rather than nonparametric conditional
inference. Otherwise the two approaches are very similar in many respects.
Hope that helps,
More information about the R-help