[R] coding of categories in rpart

Prof Brian Ripley ripley at stats.ox.ac.uk
Wed Apr 28 17:16:11 CEST 2004


On Tue, 27 Apr 2004, Prabhakar Krishnamurthy wrote:

> I am using rpart to derive classification rules for customer segments.
> I have a few categorical variables in the set of independent variables.
> For instance,
> 
> Account Size can be (Very-Small, Small, Medium, Large, V-Large)
> 
> Rpart seems to encode these categories into: a,b,c,d,e

It doesn't.  That is one output representation (of several), of the factor
levels.

> The results are expressed in terms of the encoded values.
> 
> How do I find out what encoding was used by rpart.  i.e.
> what categories in my input set do a, b, c,... correspond to?

By reading the documentation!  E.g. ?text.rpart.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list