[R] partykit ctree: minbucket and case weights

Torsten Hothorn Torsten.Hothorn at uzh.ch
Wed Jun 25 14:20:54 CEST 2014


Dear Amber,

your data contains missing values and you don't use surrogate splits to 
deal with them. So, the observations are passed down the tree randomly 
(there is no "majority" argument to "ctree_control"!) and thus it might 
happen that too small terminal nodes are created.

Simply use surrogate split and the tree will be deterministic with 
correct-sized terminal nodes (maxsurrogate = 3, for example).

Best,

Torsten

On Mon, 9 Jun 2014, Amber Dawn Nolder wrote:

> I have attached the data set (cavl) and R code used when I got the results I 
> posted about. I included the code I used at the top of the document. Below 
> that is the version of R used and some of the results I obtained.
> Many thanks!
> Amber 
> On Wed, 4 Jun 2014 09:12:15 +0200 (CEST)
> Torsten Hothorn <Torsten.Hothorn at uzh.ch> wrote:
>> 
>> On Tue, 3 Jun 2014, Amber Dawn Nolder wrote:
>> 
>>> I apologize for my lack of knowledge with R. I usually load my data as a 
>>> csv file. May I send that to you? I was not sure if I could do so on the 
>>> list.
>> 
>> yes, and the R code you used. Thanks,
>> 
>> Torsten
>> 
>>> Thank you?
>>> On Fri, 30 May 2014 09:37:23 +0200 (CEST)
>>> Torsten Hothorn <Torsten.Hothorn at uzh.ch> wrote:
>>>> 
>>>> Amber,
>>>> 
>>>> this looks like an error -- could you pls send me a reproducible example 
>>>> so that I can track the problem down?
>>>> 
>>>> Best,
>>>> 
>>>> Torsten
>>>> 
>>>> 
>>>> ________________________________________________________________
>>>> 
>>>> Prof. Dr. Torsten Hothorn                       =========
>>>>                                                  \\
>>>> Universitaet Zuerich                             \\
>>>> Institut fuer Epidemiologie, Biostatistik und     \\
>>>> Praevention, Abteilung Biostatistik               //
>>>> Hirschengraben 84                                //
>>>> CH-8001 Zuerich                                 //
>>>> Schweiz                                        //
>>>>                                                 ==========
>>>> Telephon:  +41 44 634 48 17
>>>> Fax:       +41 44 634 43 86
>>>> Web:       http://tiny.uzh.ch/6p
>>>> ________________________________________________________________
>>>> 
>>>> On Wed, 28 May 2014, Achim Zeileis wrote:
>>>> 
>>>>> Falls Du es nicht eh gesehen hast...
>>>>> 
>>>>> lg,
>>>>> Z
>>>>> 
>>>>> ---------- Forwarded message ----------
>>>>> Date: Wed, 28 May 2014 17:16:12 -0400
>>>>> From: Amber Dawn Nolder <a.d.nolder at iup.edu>
>>>>> To: r-help at r-project.org
>>>>> Subject: [R] partykit ctree: minbucket and case weights
>>>>> 
>>>>> 
>>>>>    Hello,
>>>>>    I am an R novice, and I am using the "partykit" package to create
>>>>>    regression trees. I used the following to generate the trees:
>>>>>    ctree(y~x1+x2+x3+x4,data=my_data,control=ctree_control(testtype =
>>>>>    "Bonferroni", mincriterion = 0.90, minsplit = 12, minbucket = 4,
>>>>>    majority = TRUE)
>>>>>    I thought that "minbucket" set the minimum value for the sum of 
>>>>> weights
>>>>>    in each terminal node, and that each case weight is 1, unless 
>>>>> otherwise
>>>>>    specified. In which case, the sum of case weights in a node should 
>>>>> equal the
>>>>>    number of cases (n) in that node. However, I  sometimes obtain a tree 
>>>>> with
>>>>>    a terminal node that contains fewer than 4 cases.
>>>>>    My data set has a total of 36 cases. The dependent and all 
>>>>> independent
>>>>>    variables are continuous data. Variables x1 and x2 contain missing 
>>>>> (NA)
>>>>>    values.
>>>>>    Could someone please explain why I am getting these results?
>>>>>    Am I  mistaken about the value of case weights or about the use of 
>>>>> minbucket
>>>>>    to restrict the size of a terminal node?
>>>>>    This is an example of the output:
>>>>>    Model formula:
>>>>>    y ~ x1 + x2 + x3 + x4
>>>>>    Fitted party:
>>>>>    [1] root
>>>>>    |   [2] x4 <= 30: 0.927 (n = 17, err = 1.1)
>>>>>    |   [3] x4 > 30
>>>>>    |   |   [4] x2 <= 43: 0.472 (n = 8, err = 0.4)
>>>>>    |   |   [5] x2 > 43
>>>>>    |   |   |   [6] x3 <= 0.4: 0.282 (n = 3, err = 0.0)
>>>>>    |   |   |   [7] x3 > 0.4: 0.020 (n = 8, err = 0.0)
>>>>>    Number of inner nodes:    3
>>>>>    Number of terminal nodes: 4
>>>>>    Many thanks!
>>>>>    Amber Nolder
>>>>>    Graduate Student
>>>>>    Indiana University of Pennsylvania
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide 
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>> 
>>> 
>>> 
>
>


More information about the R-help mailing list