[R] party with mob - parameter estimates not significant in terminal nodes

Achim Zeileis Achim.Zeileis at uibk.ac.at
Tue Oct 5 15:45:22 CEST 2010


Tudor:

> I successfully model-based partitioned several datasets through the use 
> of mob from the party package (thanks Achim et al. once again !!!).  At 
> times, however, the partitioning leads to terminal nodes in which the 
> parameter estimates of the model are not significant (although the split 
> points and in general the proposed segmentation both seem reasonable).

There are two aspects to this:

(1) The algorithm just determines whether the coefficients between two 
child nodes are significantly different. It may or may not be the case 
that they are significantly different from zero within each node. As an 
example: You may have a tree with a single split and two child nodes. In 
the first child node, you have a highly significant parameter value, but 
in the second node, you have no significant value.

(2) Due to partitioning, it may be the case that not all parameters of the 
model are identified in all child nodes. Currently, within mob(), this is 
not systematically checked. In particular, you may have (quasi-)complete 
separation in binomial GLMs if a child node is particularly "pure". This 
seems to have happened in your example below. From a machine learning 
point of view, this is not a bad thing, you just need to interpret it 
correctly.

> As I do not seem to be able to come up with an intuitive 
> explanation/interpretation for this (other than that the partitioning 
> model may be appropriate for parts of the dataset(s)), I wonder if any 
> of you could share your thoughts on this topic with me.  For your 
> convenience I attached a relevant set of results below.

I guess that the variable "P" is binary and that when you cross-tabulate 
it with the response for Node 3, that there are zeros in the contingency 
table. I.e. you may have a perfect split in that one sub-sample.

hth,
Z


$`2`

Call:
NULL

Deviance Residuals:
                 Min                   1Q               Median
3Q                  Max
-2.1613499829328759  -0.1182099512510448   0.0000000000000000
0.1199438072333263   1.7963628663418680

Coefficients:
                        Estimate          Std. Error  z value
Pr(>|z|)
(Intercept) 38.6736721222665096  5.1182299436934375  7.55606
0.000000000000041545 ***
P           -3.8195232976021787  0.5042297985419135 -7.57497
0.000000000000035922 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

     Null deviance: 407.0806101624161  on 293  degrees of freedom
Residual deviance: 132.0087256781199  on 292  degrees of freedom
AIC: 136.0087256781199

Number of Fisher Scoring iterations: 7


$`3`

Call:
NULL

Deviance Residuals:
                     Min                       1Q                   Median
3Q                      Max
-0.00009134433923085110   0.00000000000000000000   0.00000000000000000000
0.00000000000000000000   0.00009204763394325872

Coefficients:
                         Estimate           Std. Error  z value Pr(>|z|)
(Intercept)   1755.7555999083327 601505.6700290179579  0.00292  0.99767
P             -181.3394660743267  62127.5207770660636 -0.00292  0.99767

(Dispersion parameter for binomial family taken to be 1)

     Null deviance: 94.20918454290385568583588  on 67  degrees of freedom
Residual deviance:  0.00000001683616309495537  on 66  degrees of freedom
AIC: 4.000000016836163

Number of Fisher Scoring iterations: 25



More information about the R-help mailing list