[R] "tree-ID" in any segmentation package available?

Torsten Hothorn hothorn at hothorn.de
Thu Apr 19 14:45:59 CEST 2007


On Thu, 19 Apr 2007, Florian Koller-Meinfelder wrote:

> Dear R-helpers,
>
> I am looking for a segmentation package that gives some "tree identifier"
> as output for every observation in the data set (my response variable is
> binary). I have skimmed through "rpart", "ada" and "adabag": The output
> "trees" gives you the formula, but I have to run several thousand
> segmentations on different data sets and it is tricky to use this
> information within a macro (the only thing I could think of is to use some
> string manipulation on the tree formula and apply it to the data, but I
> hope there is an easier way - e.g. if the algorithm created 12 different
> trees a vector that links every observation to one of these 12 segments
> would be ideal).
>

is this

> library("party")
> airq <- subset(airquality, !is.na(Ozone))
>          airct <- ctree(Ozone ~ ., data = airq,
+                         controls = ctree_control(maxsurrogate = 3))
> where(airct)
   [1] 5 5 5 5 5 5 5 5 3 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 3 5 6 9 9 6 5 5 5 5 5 8 9
  [38] 6 8 9 8 8 8 8 5 6 6 3 6 8 8 9 3 8 8 6 9 8 8 8 6 3 6 6 8 8 8 8 9 8 9 6 6 5
  [75] 3 5 6 6 5 5 6 3 8 9 8 8 8 8 8 8 8 8 9 6 6 5 5 6 5 3 5 5 3 5 5 5 6 5 5 6 5
[112] 5 3 5 5 5

what you want? `where' gives you the number of the terminal node each 
observation in the learning sample is element of.

Best wishes,

Torsten


> Cheers,
> Florian
>
>
>
>
> Florian Koller-Meinfelder
> Research Consulting & Development
> ______________________________
>
> GfK Fernsehforschung GmbH
> Nordwestring 101
> 90319 Nürnberg
>
> Tel     +49 (0)911 395-3554
> Fax     +49 (0)911 395-4130
> www.gfk.com/gfkfernsehforschung
>
>
>
>
>
> This email and any attachments may contain confidential or...{{dropped}}



More information about the R-help mailing list