[R] rpart returning only 1 node

ripley@stats.ox.ac.uk ripley at stats.ox.ac.uk
Mon Mar 10 09:04:29 CET 2003


On Mon, 10 Mar 2003, Ko-Kang Kevin Wang wrote:

> Hi,
> 
> This may actually be a theoretical question.
> 
> When I tried to do the following:
> 
> ##########################################################
> > colnames(rating.adclms)
>  [1] "usage"    "mileage"  "sex"      "excess"   "ncd"     
>  [6] "primage"  "minage"   "drivers"  "district" "cargroup"
> [11] "car.age"  "adclms"   "days"    
> > rating.r1 <- rpart(adclms ~ ., data = rating.adclms, 
> +                                method = "class")
> > rating.r1
> n= 140602 
> 
> node), split, n, loss, yval, (yprob)
>       * denotes terminal node
> 
> 1) root 140602 3792 0 (9.730303e-01 2.506365e-02 1.834967e-03 
> 7.112274e-05) *
> ##########################################################
> 
> Should I set the costs in rpart()?  I'm kind of surprised to see it only 
> return 1 node for the tree.

Why are you surprised?  One class has 97% of the examples, and it may be
impossible to get a single split that makes a worthwhile improvement (1%)
in classification.  You probably want to set cp (an argument to
rpart.control).

You could use losses, but I would use weighted sub-sampling of the
training set. See my 1996 book on Pattern Recognition and Neural Networks
for the theory and the practical details.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list