[Rd] Problem using model.frame with argument subset in own function

Douglas Bates bates at stat.wisc.edu
Sun Aug 9 18:32:15 CEST 2009


On Sat, Aug 8, 2009 at 1:31 PM, Gavin Simpson<gavin.simpson at ucl.ac.uk> wrote:
> Dear List,

> I am writing a formula method for a function in a package I maintain. I
> want the method to return a data.frame that potentially only contains
> some of the variables in 'data', as specified by the formula.

The usual way to call model.frame (the method that Thomas Lumley has
called "the standard, non-standard evaluation) is to match the call to
foo, replace the name of the function being called with
as.name("model.frame") and force an evaluation in the parent frame.
it looks like

    mf <- match.call()
    if (missing(data)) data <- environment(formula)
    ## evaluate and install the model frame
    m <- match(c("formula", "data", "subset", "weights", "na.action", "offset"),
               names(mf), 0)
    mf <- mf[c(1, m)]
    mf$drop.unused.levels <- TRUE
    mf[[1]] <- as.name("model.frame")
    fr <- eval(mf, parent.frame())

The point of all of this manipulation is to achieve the kind of result
you need where the subset argument is evaluated in the correct
environmnent.

> The problem I am having is in writing the function and wrapping it
> around model.frame. Consider the following data frame:
>
> dat <- data.frame(A = runif(10), B = runif(10), C = runif(10))
>
> And the wrapper function:
>
> foo <- function(formula, data = NULL, ..., subset = NULL,
>                na.action = na.pass) {
>    mt <- terms(formula, data = data, simplify = TRUE)
>    mf <- model.frame(formula(mt), data = data, subset = subset,
>                      na.action = na.action)
>    ## real function would do more stuff here and pass mf on to
>    ## other functions
>    mf
> }
>
> This is how I envisage the function being called. The real world use
> would have a data.frame with tens or hundreds of components where only a
> few need to be excluded. Hence wanting formulas of the form below to
> work.
>
> foo(~ . - B, data = dat)
>
> The aim is to return only columns A and C in an object returned by
> model.frame. However, when I run the above, I get the following error:
>
>> foo(~ A + B, data = dat)
> Error in xj[i] : invalid subscript type 'closure'
>
> I've tracked this down to the line in model.frame.default
>
>    subset <- eval(substitute(subset), data, env)
>
> After evaluating this line, subset contains:
>
> Browse[1]> subset
> function (x, ...)
> UseMethod("subset")
> <environment: namespace:base>
>
> Not NULL, and hence the error later on when calling the internal
> model.frame code.
>
> So the question is, what am I doing wrong?
>
> If I leave the subset argument out of the definition of foo and rely
> upon the default in model.frame.default, the function works as
> expected.
>
> Perhaps the question should be, how do I modify foo() to allow it to
> have a formal subset argument, passed to model.frame?
>
> Any other suggestions gratefully accepted.
>
> Thanks in advance,
>
> G
> --
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>  Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
>  ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
>  Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
>  Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
>  UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



More information about the R-devel mailing list