[R] model.frame: how does one use it?

Marc Schwartz marc_schwartz at comcast.net
Fri Jun 15 21:33:24 CEST 2007


On Fri, 2007-06-15 at 10:47 -0500, Dirk Eddelbuettel wrote: 
> Philipp Benner reported a Debian bug report against r-cran-rpart aka rpart.
> In short, the issue has to do with how rpart evaluates a formula and
> supporting arguments, in particular 'weights'.  
> 
> A simple contrived example is
> 
> -----------------------------------------------------------------------------
> library(rpart)
> 
> ## using data from help(rpart), set up simple example
> myformula <- formula(Kyphosis ~ Age + Number + Start)
> mydata <- kyphosis
> myweight <- abs(rnorm(nrow(mydata)))
> 
> goodFunction <- function(mydata, myformula, myweight) {
>   hyp <- rpart(myformula, data=mydata, weights=myweight, method="class")
>   prev <- hyp
> }
> goodFunction(mydata, myformula, myweight)
> cat("Ok\n")
> 
> ## now remove myweight and try to compute it inside a function
> rm(myweight)
> 
> badFunction <- function(mydata, myformula) {
>   myweight <- abs(rnorm(nrow(mydata)))
>   mf <- model.frame(myformula, mydata, myweight)
>   print(head(df))
>   hyp <- rpart(myformula,
>                data=mf,
>                weights=myweight,
>                method="class")
>   prev <- hyp
> }
> badFunction(mydata, myformula)
> cat("Done\n")
> -----------------------------------------------------------------------------
> 
> Here goodFunction works, but only because myweight (with useless random
> weights, but that is not the point here) is found from the calling
> environment. 
> 
> badFunction fails after we remove myweight from there:
> 
> :~> cat /tmp/philipp.R | R --slave
> Ok
> Error in eval(expr, envir, enclos) : object "myweight" not found
> Execution halted
> :~>    
> 
> As I was able to replicate it, I reported this to the package maintainer.  It
> turns out that seemingly all is well as this is supposed to work this way,
> and I got a friendly pointer to study model.frame and its help page.  
> 
> Now I am stuck as I can't make sense of model.frame -- see badFunction
> above. I would greatly appreciate any help in making rpart work with a local
> argument weights so that I can tell Philipp that there is no bug.  :)
> 
> Regards, Dirk


Dirk,

As you note, the issue is the non-standard evaluation of the arguments
in model.frame()  The key section of the Details in ?model.frame is:


All the variables in formula, subset and in ... are looked for first in
data and then in the environment of formula (see the help for formula()
for further details) and collected into a data frame. Then the subset
expression is evaluated, and it is is used as a row index to the data
frame. Then the na.action function is applied to the data frame (and may
well add attributes). The levels of any factors in the data frame are
adjusted according to the drop.unused.levels and xlev arguments.


Note that even with your goodFunction(), if 'myweight' is created within
the environment of the function and not in the global environment, it
still fails:

library(rpart)
myformula <- formula(Kyphosis ~ Age + Number + Start)
mydata <- kyphosis

goodFunction <- function(mydata, myformula) {
                         myweight <- abs(rnorm(nrow(mydata)))
                         hyp <- rpart(myformula, data=mydata,
                                      weights=myweight, method="class")
                         prev <- hyp
                        }


> goodFunction(mydata, myformula)
Error in eval(expr, envir, enclos) : object "myweight" not found


However, now let's do this:


library(rpart)
myformula <- formula(Kyphosis ~ Age + Number + Start)
mydata <- kyphosis
myweight <- abs(rnorm(nrow(mydata)))

goodFunction <- function(mydata, myformula) {
                         hyp <- rpart(myformula, data=mydata,
                                      weights=myweight, method="class")
                         prev <- hyp
                        }

> goodFunction(mydata, myformula)
> 

It works, because 'myweight' is found in the global environment, which
is where the formula is created.


Now, final example, try this:


library(rpart)
goodFunction <- function() {
                         myformula <- formula(Kyphosis ~ Age + Number +
                                              Start)
                         mydata <- kyphosis
                         myweight <- abs(rnorm(nrow(mydata)))

                         hyp <- rpart(myformula, data=mydata,
                                      weights=myweight, method="class")
                         prev <- hyp
                        }

> goodFunction()
> 

It works because the formula is created within the environment of the
function and hence, 'myweight', which is created there as well, is
found.

There was a (non) bug filed on a related matter dealing with the
evaluation of 'subset':

http://bugs.r-project.org/cgi-bin/R/feature%26FAQ?id=3671

and you might find this document on Non-Standard Evaluation helpful:

http://developer.r-project.org/nonstandard-eval.pdf

HTH,

Marc



More information about the R-help mailing list