[R] Rpart, custom penalty for an error

Prof Brian Ripley ripley at stats.ox.ac.uk
Sun Sep 10 21:36:10 CEST 2006


On Sun, 10 Sep 2006, Maciej Blizi?ski wrote:

> Hello all R-help list subscribers,
> 
> I'd like to create a regression tree of a data set with binary response
> variable. Only 5% of observations are a success, so the regression tree
> will not find really any variable value combinations that will yield
> more than 50% of probability of success. 

This would be a misuse of a regression tree, for the exact problem for 
which classification trees were designed.

> I am however interested in areas where the probability of success is 
> noticeably higher than 5%, for example 20%. I've tried rpart and the 
> weights option, increasing the weights of the success-observations.

You are 'misleading' rpart by using 'weights', claiming to have case
weights for cases you do not have.  You need to use 'cost' instead.

This is a standard issue, discussed in all good books on classification
(including mine).

> It works as expected in terms of the tree creation: instead of a single
> root, a tree is being built. But the tree plot() and text() are somewhat
> misleading. I'm interested in the observation counts inside each leaf.
> I use the "use.n = TRUE" parameter. The counts displayed are misleading,
> the numbers of successes are not the original numbers from the sample,
> they seem to be cloned success-observations.

They _are_ the original numbers, for that is what 'case weights' means.

> I'd like to split the tree just as weights parameter allows me to,
> keeping the original number of observations in the tree plot. Is it
> possible? If yes, how?
> 
> Kind regards,
> Maciej

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list