[Rd] evaluation in transform versus within

Joris Meys jorismeys at gmail.com
Wed Apr 1 22:38:12 CEST 2015


Thanks all, I see where I misunderstood the issue. I would like to suggest
though to add a similar warning to the help page of with() and within()
like there is already on subset() and transform().

Cheers
Joris

On Wed, Apr 1, 2015 at 9:18 PM, Duncan Murdoch <murdoch.duncan at gmail.com>
wrote:

> On 01/04/2015 2:33 PM, Joris Meys wrote:
>
>> Thank you for the insights. I understood as much from the code, but I
>> can't really see how this can cause a problem when using with() or within()
>> within a package or a function. The environments behave like I would
>> expect, as does the evaluation of the arguments. The second argument is
>> supposed to be an expression, so I would expect that expression to be
>> evaluated in the data frame first.
>>
>
> I don't know the context within which you were told that they are
> problematic, but one issue is that it makes typo detection harder, since
> the code analysis won't see typos.
>
> For example:
>
> df <- data.frame(col1 = 1)
> global <- 3
>
> with(df, col1 + global)  # fine
> with(df, col1 + Global)  # typo, but still no warning
>
> whereas
>
> df$col1 + global  # fine
> df$col1 + Global # "no visible binding for global variable 'Global'"
>
> and of course you'll get in a real mess later with the with() code if you
> add a column named "global" to your dataframe.
>
> Duncan Murdoch
>
>
>> I believed the warning in subset() and transform() refers to the
>> consequences of using the dotted argument and the evaluation thereof inside
>> the function, but I might have misunderstood this. I've always considered
>> within() the programming equivalent of the convenience function transform().
>>
>> Sorry for using the r-devel list, but I reckoned this could have
>> consequences for package developers like me. More explicitly: if within()
>> poses the same risk as transform() (which I'm still not sure of), a warning
>> on the help page of within() would be suited imho.  I will use the r-help
>> list in the future.
>>
>> Kind regards
>> Joris
>>
>> On Wed, Apr 1, 2015 at 7:55 PM, Duncan Murdoch <murdoch.duncan at gmail.com
>> <mailto:murdoch.duncan at gmail.com>> wrote:
>>
>>     On 01/04/2015 1:35 PM, Gabriel Becker wrote:
>>
>>         Joris,
>>
>>
>>         The second argument to evalq is envir, so that line says,
>>         roughly, "call
>>         environment() to generate me a new environment within the
>>         environment
>>         defined by data".
>>
>>
>>     I think that's not quite right.  environment() returns the current
>>     environment, it doesn't create a new one.  It is evalq() that
>>     created a new environment from data, and environment() just
>>     returns it.
>>
>>     Here's what happens.  I've put the code first, the description of
>>     what happens on the line below.
>>
>>         parent <- parent.frame()
>>
>>     Get the environment from which within.data.frame was called.
>>
>>         e <- evalq(environment(), data, parent)
>>
>>     Create a new environment containing the columns of data, with the
>>     parent being the environment where we were called.
>>     Return it and store it in e.
>>
>>         eval(substitute(expr), e)
>>
>>     Evaluate the expression in this new environment.
>>
>>         l <- as.list(e)
>>
>>     Convert it to a list.
>>
>>         l <- l[!vapply(l, is.null, NA, USE.NAMES = FALSE)]
>>
>>     Delete NULL entries from the list.
>>
>>         nD <- length(del <- setdiff(names(data), (nl <- names(l))))
>>
>>     Find out if any columns were deleted.
>>
>>         data[nl] <- l
>>
>>     Set the columns of data to the values from the list.
>>
>>         if (nD)
>>             data[del] <- if (nD == 1)
>>                 NULL
>>             else vector("list", nD)
>>         data
>>
>>     Delete the columns from data which were deleted from the list.
>>
>>
>>
>>         Note that that is is only generating e, the environment that
>>         expr will be
>>         evaluated within in the next line (the call to eval). This
>>         means that expr
>>         is evaluated in an environment which is inside the environment
>>         defined by
>>         data, so you get non-standard evaluation in that symbols
>>         defined in data
>>         will be available to expr earlier in symbol lookup than those
>>         in the
>>         environment that within() was called from.
>>
>>
>>     This again sounds like there are two environments created, when
>>     really there's just one, but the last part is correct.
>>
>>     Duncan Murdoch
>>
>>
>>
>>         This is easy to confirm from the behavior of these functions:
>>
>>         > df = data.frame(x = 1:10, y = rnorm(10))
>>         > x = "I'm a character"
>>         > mean(x)
>>         [1] NA
>>         Warning message:
>>         In mean.default(x) : argument is not numeric or logical:
>>         returning NA
>>         > within(df, mean.x <- mean(x))
>>              x            y mean.x
>>         1   1  0.396758869    5.5
>>         2   2  0.945679050    5.5
>>         3   3  1.980039723    5.5
>>         4   4 -0.187059706    5.5
>>         5   5  0.008220067    5.5
>>         6   6  0.451175885    5.5
>>         7   7 -0.262064017    5.5
>>         8   8 -0.652301191    5.5
>>         9   9  0.673609455    5.5
>>         10 10 -0.075590905    5.5
>>         > with(df, mean(x))
>>         [1] 5.5
>>
>>         P.S. this is probably an r-help question.
>>
>>         Best,
>>         ~G
>>
>>
>>
>>
>>         On Wed, Apr 1, 2015 at 10:21 AM, Joris Meys
>>         <jorismeys at gmail.com <mailto:jorismeys at gmail.com>> wrote:
>>
>>         > Dear list members,
>>         >
>>         > I'm a bit confused about the evaluation of expressions using
>>         with() or
>>         > within() versus subset() and transform(). I always teach my
>>         students to use
>>         > with() and within() because of the warning mentioned in the
>>         helppages of
>>         > subset() and transform(). Both functions use nonstandard
>>         evaluation and are
>>         > to be used only interactively.
>>         >
>>         > I've never seen that warning on the help page of with() and
>>         within(), so I
>>         > assumed both functions can safely be used in functions and
>>         packages. I've
>>         > now been told that both functions pose the same risk as
>>         subset() and
>>         > transform().
>>         >
>>         > Looking at the source code I've noticed the extra step:
>>         >
>>         > e <- evalq(environment(), data, parent)
>>         >
>>         > which, at least according to my understanding, should ensure
>>         that the
>>         > functions follow the standard evaluation rules. Could
>>         somebody with more
>>         > knowledge than I have shed a bit of light on this issue?
>>         >
>>         > Thank you
>>         > Joris
>>         >
>>         > --
>>         > Joris Meys
>>         > Statistical consultant
>>         >
>>         > Ghent University
>>         > Faculty of Bioscience Engineering
>>         > Department of Mathematical Modelling, Statistics and
>>         Bio-Informatics
>>         >
>>         > tel : +32 (0)9 264 61 79 <tel:%2B32%20%280%299%20264%2061%2079>
>>         > Joris.Meys at Ugent.be
>>         > -------------------------------
>>         > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>>         >
>>         >         [[alternative HTML version deleted]]
>>         >
>>         > ______________________________________________
>>         > R-devel at r-project.org <mailto:R-devel at r-project.org> mailing
>>         list
>>         > https://stat.ethz.ch/mailman/listinfo/r-devel
>>         >
>>
>>
>>
>>
>>
>>
>>
>> --
>> Joris Meys
>> Statistical consultant
>>
>> Ghent University
>> Faculty of Bioscience Engineering
>> Department of Mathematical Modelling, Statistics and Bio-Informatics
>>
>> tel :  +32 (0)9 264 61 79
>> Joris.Meys at Ugent.be
>> -------------------------------
>> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>>
>
>


-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics

tel :  +32 (0)9 264 61 79
Joris.Meys at Ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

	[[alternative HTML version deleted]]



More information about the R-devel mailing list