[Rd] Most efficient way to check the length of a variable mentioned in a formula.

Joris Meys jorismeys at gmail.com
Sat Oct 18 21:56:53 CEST 2014


Thanks again William, I owe you one!
Cheers
Joris

On Fri, Oct 17, 2014 at 11:36 PM, William Dunlap <wdunlap at tibco.com> wrote:

> In my example function I did not evaluate the formula either, just a part
> of it.
>
> If you leave off the envir and enclos arguments to eval in your
> function you can get surprising (wrong) results.  E.g.,
>   > afun(y ~ varnames)
>   [[1]]
>    [1] 10  9  8  7  6  5  4  3  2  1
>
>   [[2]]
>   [1] "y"        "varnames"
>
> If you want to use the variables in data or environment(formula) and
> some functions defined in your function, then you could make a child
> environment of environment(formula), put your locally defined
> functions in it, and use the child environment in the call to eval.
> E.g., you code would become
> afun2 <- function(formula, ...){
>
>     varnames <- all.vars(formula)
>     fenv <- environment(formula)
>
>     n <- length(eval(as.name(varnames[1]), envir=fenv))
>     childEnv <- new.env(parent=fenv)
>     childEnv$fun <- function(x) x/n
>
>     myterms <- terms(formula)
>     eval(attr(myterms, 'variables'), envir=childEnv)
> }
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
>
> On Fri, Oct 17, 2014 at 1:50 PM, Joris Meys <jorismeys at gmail.com> wrote:
> > Thank you both, great ideas.  William, I see the point of using eval, but
> > the problem is that I can't evaluate the formula itself yet. I need to
> know
> > the length of these variables to create a function that is used to
> evaluate.
> > So if I try to evaluate the formula in some way before I created the
> > function, it will just return an error.
> >
> > Now I use the attribute variables of the formula terms to get the
> variables
> > that -after some more manipulation- eventually will be the model matrix.
> > Something like this :
> >
> > afun <- function(formula, ...){
> >
> >     varnames <- all.vars(formula)
> >     fenv <- environment(formula)
> >
> >     txt <- paste('length(',varnames[1],')')
> >     n <- eval(parse(text=txt), envir=fenv)
> >
> >     fun <- function(x) x/n
> >
> >     myterms <- terms(formula)
> >     eval(attr(myterms, 'variables'))
> >
> > }
> >
> > And that should give:
> >
> >> x <- 1:10
> >> y <- 10:1
> >> z <- 11:20
> >> afun(z ~ fun(x) + y)
> > [[1]]
> >  [1] 11 12 13 14 15 16 17 18 19 20
> >
> > [[2]]
> >  [1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
> >
> > [[3]]
> >  [1] 10  9  8  7  6  5  4  3  2  1
> >
> > It might be I'm walking to Paris over Singapore, but I couldn't find a
> > better way to do it.
> >
> > Cheers
> > Joris
> >
> > On Fri, Oct 17, 2014 at 10:16 PM, William Dunlap <wdunlap at tibco.com>
> wrote:
> >>
> >> I got the default value for getRHSLength's data argument wrong - it
> >> should be NULL, not parent.env().
> >>    getRHSLength <- function (formula, data = NULL)
> >>    {
> >>        rhsExpr <- formula[[length(formula)]]
> >>        rhsValue <- eval(rhsExpr, envir = data, enclos =
> >> environment(formula))
> >>        length(rhsValue)
> >>    }
> >> so that the function firstHalf is found in the following
> >>    > X <- 1:10
> >>    >
> >>
> getRHSLength((function(){firstHalf<-function(x)x[seq_len(floor(length(x)/2))];
> >> ~firstHalf(X)})())
> >>    [1] 5
> >>
> >>
> >> Bill Dunlap
> >> TIBCO Software
> >> wdunlap tibco.com
> >>
> >>
> >> On Fri, Oct 17, 2014 at 11:57 AM, William Dunlap <wdunlap at tibco.com>
> >> wrote:
> >> > I would use eval(), but I think that most formula-using functions do
> >> > it more like the following.
> >> >
> >> > getRHSLength <-
> >> > function (formula, data = parent.frame())
> >> > {
> >> >     rhsExpr <- formula[[length(formula)]]
> >> >     rhsValue <- eval(rhsExpr, envir = data, enclos =
> >> > environment(formula))
> >> >     length(rhsValue)
> >> > }
> >> >
> >> > * use eval() instead of get() so you will find variables are in
> >> > ancestral environments
> >> > of envir (if envir is an environment), not just envir itself.
> >> > * just evaluate the stuff in the formula using the non-standard
> >> > evaluation frame,
> >> > call length() in the current frame.  Otherwise, if  envir inherits
> >> > directly from emptyenv() the 'length' function will not be found.
> >> > * use envir=data so it looks first in the data argument for variables
> >> > * the enclos argument is used if envir is not an environment and is
> used
> >> > to
> >> > find variables that are not in envir.
> >> >
> >> > Here are some examples:
> >> >   > X <- 1:10
> >> >   > getRHSLength(~X)
> >> >   [1] 10
> >> >   > getRHSLength(~X, data=data.frame(X=1:2))
> >> >   [1] 2
> >> >   > getRHSLength((function(){X <- 1:4; ~X})(), data=data.frame())
> >> >   [1] 4
> >> >   > getRHSLength((function(){X <- 1:4; ~X})(), data=data.frame(X=1:2))
> >> >   [1] 2
> >> >   > getRHSLength((function(){X <- 1:4; ~X})(),
> >> > data=list2env(data.frame()))
> >> >   [1] 10
> >> >   > getRHSLength((function(){X <- 1:4; ~X})(), data=emptyenv())
> >> >   Error in eval(expr, envir, enclos) : object 'X' not found
> >> >
> >> > I think you will see the same lookups if you try analogous things with
> >> > lm().
> >> > Bill Dunlap
> >> > TIBCO Software
> >> > wdunlap tibco.com
> >> >
> >> >
> >> > On Fri, Oct 17, 2014 at 11:04 AM, Joris Meys <jorismeys at gmail.com>
> >> > wrote:
> >> >> Dear R gurus,
> >> >>
> >> >> I need to know the length of a variable (let's call that X) that is
> >> >> mentioned in a formula. So obviously I look for the environment from
> >> >> which
> >> >> the formula is called and then I have two options:
> >> >>
> >> >> - using eval(parse(text='length(X)'),
> >> >>                     envir=environment(formula) )
> >> >>
> >> >> - using length(get('X'),
> >> >>             envir=environment(formula) )
> >> >>
> >> >> a bit of benchmarking showed that the first option is about 20 times
> >> >> slower, to that extent that if I repeat it 10,000 times I save more
> >> >> than
> >> >> half a second. So speed is not really an issue here.
> >> >>
> >> >> Personally I'd go for option 2 as that one is easier to read and does
> >> >> the
> >> >> job nicely, but with these functions I'm always a bit afraid that I'm
> >> >> overseeing important details or side effects here (possibly memory
> >> >> issues
> >> >> when working with larger data).
> >> >>
> >> >> Anybody an idea what the dangers are of these methods, and which one
> is
> >> >> the
> >> >> most robust method?
> >> >>
> >> >> Thank you
> >> >> Joris
> >> >>
> >> >> --
> >> >> Joris Meys
> >> >> Statistical consultant
> >> >>
> >> >> Ghent University
> >> >> Faculty of Bioscience Engineering
> >> >> Department of Mathematical Modelling, Statistics and Bio-Informatics
> >> >>
> >> >> tel : +32 9 264 59 87
> >> >> Joris.Meys at Ugent.be
> >> >> -------------------------------
> >> >> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
> >> >>
> >> >>         [[alternative HTML version deleted]]
> >> >>
> >> >> ______________________________________________
> >> >> R-devel at r-project.org mailing list
> >> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >
> >
> >
> > --
> > Joris Meys
> > Statistical consultant
> >
> > Ghent University
> > Faculty of Bioscience Engineering
> > Department of Mathematical Modelling, Statistics and Bio-Informatics
> >
> > tel : +32 9 264 59 87
> > Joris.Meys at Ugent.be
> > -------------------------------
> > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics

tel : +32 9 264 59 87
Joris.Meys at Ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

	[[alternative HTML version deleted]]



More information about the R-devel mailing list