[R] rpart: apply tree to new data to get "counts"

Stephen Milborrow milbo at sonic.net
Tue Aug 30 20:14:45 CEST 2011


Jay <josip.2000 at gmail.com> het geskryf
> When I have made a decision tree with rpart, is it possible to "apply"
> this tree to a new set of data in order to find out the distribution
> of observations? Ideally I would like to plot my original tree, with
> the counts (at each node) of the new data.

Sadly, neither plot.rpart or rpart.plot support plotting a tree trained on 
one set of data but showing results predicted for a new set of data.  Page 
21 of the vignette for the rpart.plot package has this to say

"Arguably the most serious limitation of the current implementation is its 
inability to display results on test data (on the tree derived from the 
training data)."

One way of implementing this (quite a lot of work) would be to extend the 
rpart function to include a newdata argument.  If given such an argument, 
rpart would additionally return new.frame, new.where, and new.y fields 
(corresponding to the existing frame, where, and y fields).  The plotting 
functions could then trivially be extended to use these new fields.



More information about the R-help mailing list