[R] User defined split function in rpart

Mon Feb 11 18:29:42 CET 2008

I had a similar problem, trying to use lme within a custom rpart
function.  I got around it by passing the dataframe I needed through
the parms option in rpart, and then using the parms option in
evaluation, init and split as a dataset.  It's not the most elegant
solution, but it will work.

Have you (or anyone else) figured out the details of the summary and
text options in the init function?  I know that they are used to fill
out the summary of the model and the text.rpart plotting, but I can't
seem to use any of the variables being passed to them efficiently (or
at all).

Hope that helps,
Sam Stewart

On Feb 20, 2007 2:47 PM, Tobias Guennel <tguennel at vcu.edu> wrote:
> I have made some progress with the user defined splitting function and I got
> a lot of the things I needed to work. However, I am still stuck on accessing
> the node data. It would probably be enough if somebody could tell me, how I
> can access the original data frame of the call to rpart.
> So if the call is: fit0 <- rpart(Sat ~Infl +Cont+ Type,
>              housing, control=rpart.control(minsplit=10, xval=0),
>              method=alist)
> how can I access the housing data frame within the user defined splitting
> function?
>
> Any input would be highly appreciated!
>
> Thank you
> Tobias Guennel
>
>
> -----Original Message-----
> From: Tobias Guennel [mailto:tguennel at vcu.edu]
> Sent: Monday, February 19, 2007 3:40 PM
> To: 'r-help at stat.math.ethz.ch'
> Subject: [R] User defined split function in rpart
>
> Maybe I should explain my Problem a little bit more detailed.
> The rpart package allows for user defined split functions. An example is
> given in the source/test directory of the package as usersplits.R.
> The comments say that three functions have to be supplied:
> 1. "The 'evaluation' function.  Called once per node.
>   Produce a label (1 or more elements long) for labeling each node,
>   and a deviance."
> 2. The split function, where most of the work occurs.
>    Called once per split variable per node.
> 3. The init function:
>    fix up y to deal with offsets
>    return a dummy parms list
>    numresp is the number of values produced by the eval routine's "label".
>
> I have altered the evaluation function and the split function for my needs.
> Within those functions, I need to fit a proportional odds model to the data
> of the current node. I am using the polr() routine from the MASS package to
> fit the model.
> Now my problem is, how can I call the polr() function only with the data of
> the current node. That's what I tried so far:
>
> evalfunc <- function(y,x,parms,data) {
>
> pomnode<-polr(data$y~data$x,data,weights=data$Freq)
> parprobs<-predict(pomnode,type="probs")
> dev<-0
> K<-dim(parprobs)[2]
> N<-dim(parprobs)[1]/K
> for(i in 1:N){
> tempsum<-0
> Ni<-0
> for(l in 1:K){
> Ni<-Ni+data$Freq[K*(i-1)+l]
> }
> for(j in 1:K){
> tempsum<-tempsum+data$Freq[K*(i-1)+j]/Ni*log(parprobs[i,j]*Ni/data$Freq[K*(i
> -1)+j])
> }
> dev=dev+Ni*tempsum
> }
> dev=-2*dev
> wmean<-1
> list(label= wmean, deviance=dev)
>
> }
>
> I get the error: Error in eval(expr, envir, enclos) : argument "data" is
> missing, with no default
>
> How can I use the data of the current node?
>
> Thank you
> Tobias Guennel
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>