[R] model.frame: how does one use it?

Deepayan Sarkar deepayan.sarkar at gmail.com
Fri Jun 15 21:23:48 CEST 2007


On 6/15/07, Dirk Eddelbuettel <edd at debian.org> wrote:
>
> Philipp Benner reported a Debian bug report against r-cran-rpart aka rpart.
> In short, the issue has to do with how rpart evaluates a formula and
> supporting arguments, in particular 'weights'.
>
> A simple contrived example is
>
> -----------------------------------------------------------------------------
> library(rpart)
>
> ## using data from help(rpart), set up simple example
> myformula <- formula(Kyphosis ~ Age + Number + Start)
> mydata <- kyphosis
> myweight <- abs(rnorm(nrow(mydata)))
>
> goodFunction <- function(mydata, myformula, myweight) {
>   hyp <- rpart(myformula, data=mydata, weights=myweight, method="class")
>   prev <- hyp
> }
> goodFunction(mydata, myformula, myweight)
> cat("Ok\n")
>
> ## now remove myweight and try to compute it inside a function
> rm(myweight)
>
> badFunction <- function(mydata, myformula) {
>   myweight <- abs(rnorm(nrow(mydata)))
>   mf <- model.frame(myformula, mydata, myweight)
>   print(head(df))
>   hyp <- rpart(myformula,
>                data=mf,
>                weights=myweight,
>                method="class")
>   prev <- hyp
> }
> badFunction(mydata, myformula)
> cat("Done\n")
> -----------------------------------------------------------------------------
>
> Here goodFunction works, but only because myweight (with useless random
> weights, but that is not the point here) is found from the calling
> environment.
>
> badFunction fails after we remove myweight from there:
>
> :~> cat /tmp/philipp.R | R --slave
> Ok
> Error in eval(expr, envir, enclos) : object "myweight" not found
> Execution halted
> :~>
>
> As I was able to replicate it, I reported this to the package maintainer.  It
> turns out that seemingly all is well as this is supposed to work this way,
> and I got a friendly pointer to study model.frame and its help page.
>
> Now I am stuck as I can't make sense of model.frame -- see badFunction
> above. I would greatly appreciate any help in making rpart work with a local
> argument weights so that I can tell Philipp that there is no bug.  :)

I don't know if ?model.frame is the best place page to look. There's a
more detailed description at

http://developer.r-project.org/nonstandard-eval.pdf

but here are the non-standard evaluation rules as I understand them:
given a name in either (1) the formula or (2) ``special'' arguments like
'weights' in this case, or 'subset', try to find the name

1. in 'data'
2. failing that, in environment(formula)
3. failing that, in the enclosing environment, and so on.

By 'name', I mean a symbol, such as 'Age' or 'myweight'.  So
basically, everything is as you would expect if the name is visible in
data, but if not, the search starts in the environment of the formula,
not the environment where the function call is being made (which is
the standard evaulation behaviour).  This is a feature, not a bug
(things would be a lot more confusing if it were the other way round).


With this in mind, either of the following might do what you want:

badFunction <- function(mydata, myformula) {
    mydata$myweight <- abs(rnorm(nrow(mydata)))
    hyp <-
        rpart(myformula,
              data=mydata,
              weights=myweight,
              method="class")
    prev <- hyp
}


badFunction <- function(mydata, myformula) {
    myweight <- abs(rnorm(nrow(mydata)))
    environment(myformula) <- environment()
    hyp <-
        rpart(myformula,
              data=mydata,
              weights=myweight,
              method="class")
    prev <- hyp
}

-Deepayan



More information about the R-help mailing list