[Rd] evaluation in transform versus within

Wed Apr 1 20:52:23 CEST 2015

There is no important difference between transform() and within(). They
have the same pitfalls. If your general code is unable to guarantee the
scope of the symbol resolution, the behavior of the code is unlikely to be
very predictable.

I've explored some solutions in the S4Vectors package, see the
S4Vectors:::safeEval. We use that for subset(), etc, on the core Bioc data
structures. Basically, the user is encouraged to escape symbols that should
be resolved in the enclosing environment (not data) with the .() notation
of bquote(). This clarifies the expectations of the programmer and protects
against problems. Later, active bindings are (optionally) employed to catch
any attempts at resolving a symbol in the enclosing environment, which
results in a warning. Just an experiment; feedback welcome.

Michael

On Wed, Apr 1, 2015 at 11:33 AM, Joris Meys <jorismeys at gmail.com> wrote:

> Thank you for the insights. I understood as much from the code, but I can't
> really see how this can cause a problem when using with() or within()
> within a package or a function. The environments behave like I would
> expect, as does the evaluation of the arguments. The second argument is
> supposed to be an expression, so I would expect that expression to be
> evaluated in the data frame first.
>
> I believed the warning in subset() and transform() refers to the
> consequences of using the dotted argument and the evaluation thereof inside
> the function, but I might have misunderstood this. I've always considered
> within() the programming equivalent of the convenience function
> transform().
>
> Sorry for using the r-devel list, but I reckoned this could have
> consequences for package developers like me. More explicitly: if within()
> poses the same risk as transform() (which I'm still not sure of), a warning
> on the help page of within() would be suited imho.  I will use the r-help
> list in the future.
>
> Kind regards
> Joris
>
> On Wed, Apr 1, 2015 at 7:55 PM, Duncan Murdoch <murdoch.duncan at gmail.com>
> wrote:
>
> > On 01/04/2015 1:35 PM, Gabriel Becker wrote:
> >
> >> Joris,
> >>
> >>
> >> The second argument to evalq is envir, so that line says, roughly, "call
> >> environment() to generate me a new environment within the environment
> >> defined by data".
> >>
> >
> > I think that's not quite right.  environment() returns the current
> > environment, it doesn't create a new one.  It is evalq() that created a
> new
> > environment from data, and environment() just returns it.
> >
> > Here's what happens.  I've put the code first, the description of what
> > happens on the line below.
> >
> >     parent <- parent.frame()
> >
> > Get the environment from which within.data.frame was called.
> >
> >     e <- evalq(environment(), data, parent)
> >
> > Create a new environment containing the columns of data, with the parent
> > being the environment where we were called.
> > Return it and store it in e.
> >
> >     eval(substitute(expr), e)
> >
> > Evaluate the expression in this new environment.
> >
> >     l <- as.list(e)
> >
> > Convert it to a list.
> >
> >     l <- l[!vapply(l, is.null, NA, USE.NAMES = FALSE)]
> >
> > Delete NULL entries from the list.
> >
> >     nD <- length(del <- setdiff(names(data), (nl <- names(l))))
> >
> > Find out if any columns were deleted.
> >
> >     data[nl] <- l
> >
> > Set the columns of data to the values from the list.
> >
> >     if (nD)
> >         data[del] <- if (nD == 1)
> >             NULL
> >         else vector("list", nD)
> >     data
> >
> > Delete the columns from data which were deleted from the list.
> >
> >
> >
> >> Note that that is is only generating e, the environment that expr will
> be
> >> evaluated within in the next line (the call to eval). This means that
> expr
> >> is evaluated in an environment which is inside the environment defined
> by
> >> data, so you get non-standard evaluation in that symbols defined in data
> >> will be available to expr earlier in symbol lookup than those in the
> >> environment that within() was called from.
> >>
> >
> > This again sounds like there are two environments created, when really
> > there's just one, but the last part is correct.
> >
> > Duncan Murdoch
> >
> >
> >
> >> This is easy to confirm from the behavior of these functions:
> >>
> >> > df = data.frame(x = 1:10, y = rnorm(10))
> >> > x = "I'm a character"
> >> > mean(x)
> >> [1] NA
> >> Warning message:
> >> In mean.default(x) : argument is not numeric or logical: returning NA
> >> > within(df, mean.x <- mean(x))
> >>      x            y mean.x
> >> 1   1  0.396758869    5.5
> >> 2   2  0.945679050    5.5
> >> 3   3  1.980039723    5.5
> >> 4   4 -0.187059706    5.5
> >> 5   5  0.008220067    5.5
> >> 6   6  0.451175885    5.5
> >> 7   7 -0.262064017    5.5
> >> 8   8 -0.652301191    5.5
> >> 9   9  0.673609455    5.5
> >> 10 10 -0.075590905    5.5
> >> > with(df, mean(x))
> >> [1] 5.5
> >>
> >> P.S. this is probably an r-help question.
> >>
> >> Best,
> >> ~G
> >>
> >>
> >>
> >>
> >> On Wed, Apr 1, 2015 at 10:21 AM, Joris Meys <jorismeys at gmail.com>
> wrote:
> >>
> >> > Dear list members,
> >> >
> >> > I'm a bit confused about the evaluation of expressions using with() or
> >> > within() versus subset() and transform(). I always teach my students
> to
> >> use
> >> > with() and within() because of the warning mentioned in the helppages
> of
> >> > subset() and transform(). Both functions use nonstandard evaluation
> and
> >> are
> >> > to be used only interactively.
> >> >
> >> > I've never seen that warning on the help page of with() and within(),
> >> so I
> >> > assumed both functions can safely be used in functions and packages.
> >> I've
> >> > now been told that both functions pose the same risk as subset() and
> >> > transform().
> >> >
> >> > Looking at the source code I've noticed the extra step:
> >> >
> >> > e <- evalq(environment(), data, parent)
> >> >
> >> > which, at least according to my understanding, should ensure that the
> >> > functions follow the standard evaluation rules. Could somebody with
> more
> >> > knowledge than I have shed a bit of light on this issue?
> >> >
> >> > Thank you
> >> > Joris
> >> >
> >> > --
> >> > Joris Meys
> >> > Statistical consultant
> >> >
> >> > Ghent University
> >> > Faculty of Bioscience Engineering
> >> > Department of Mathematical Modelling, Statistics and Bio-Informatics
> >> >
> >> > tel :  +32 (0)9 264 61 79
> >> > Joris.Meys at Ugent.be
> >> > -------------------------------
> >> > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
> >> >
> >> >         [[alternative HTML version deleted]]
> >> >
> >> > ______________________________________________
> >> > R-devel at r-project.org mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >> >
> >>
> >>
> >>
> >>
> >
>
>
> --
> Joris Meys
> Statistical consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Mathematical Modelling, Statistics and Bio-Informatics
>
> tel :  +32 (0)9 264 61 79
> Joris.Meys at Ugent.be
> -------------------------------
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

	[[alternative HTML version deleted]]