[R] MOB (party package) Question - Variable Selection

Achim Zeileis Achim.Zeileis at uibk.ac.at
Wed Aug 7 19:37:53 CEST 2013


Michael:

> Hi. I am a grad student and I'm currently using the MOB function in the 
> R party package and I had a question. I am working on an environmental 
> problem with about 100 predictors. I am having trouble determining which 
> predictors to use for regression and which for partitioning, is there 
> any sort of method to determine this?

That depends a little bit on what exactly you are trying to achieve. When 
we developed MOB, we had the following situation in mind:

- You have some sort of data for which you know from the literature that a 
certain type of model works well. For example, log(y) ~ log(x1) + log(x2) 
or something like that.

- But you also have data on a bunch of other variables that you don't know 
yet how they should enter the model. Often these are categorical variables 
or numerical variables that are not part of the standard theory.

- Then MOB is one possible approach to check whether these additional 
variables affect the basic standard model or not. And by recursive 
partitioning you could capture various types of main and interaction 
effects.

However, if you just have a response variable and a bunch of regressors 
where you don't have much prior knowledge. And you want to select both the 
relevant variables and their functional form, then MOB might help you but 
there might also be other methods that are more natural. For example, GAMs 
or boosting etc.

> Does it cause problems if a variable is used for both regression and 
> partitioning?

In principle, this is possible. Whether or not this is meaningful and/or 
easy to interpret depends on the particular data though.

> I attempted to pre-screen the variables using stepwise linear regression 
> and I used the selected variables for regression and all others for 
> partitioning. However this lead to the model only having one node.

That's not very surprising, is it? You already tried to capture the 
potential influence of all regressors on your response. Of course, MOB 
might have turned up a few additional interactions but I'm not surprised 
if it doesn't.

We've obtained the most useful results when the basic model had relatively 
few parameters and was easy/natural to interpret.

Hope that helps,
Z



More information about the R-help mailing list