[Rd] New pipe operator

Duncan Murdoch murdoch@dunc@n @end|ng |rom gm@||@com
Mon Dec 7 11:31:32 CET 2020


On 06/12/2020 8:22 p.m., Bravington, Mark (Data61, Hobart) wrote:
> Seems like this *could* be a good thing, and thanks to R core for considering it. But, FWIW:
> 
>   - I agree with Gabor G that consistency of "syntax" should be paramount here. Enough problems have been caused by earlier superficially-convenient non-standard features in R.  In particular:
> 
>   -- there should not be any discrepancy between an in-place function-definition, and a predefined function attached to a symbol (as per Gabor's point).
>   
>   -- Hence, the ability to say x |> foo  ie without parentheses, seems bound to lead to inconsistency, because x |> foo is allowed, x |> base::foo isn't allowed without tricks, but x |> function( y) foo( y) isn't... So, x |> foo is not worth keeping. Parentheses are a price well worth paying.
>   
>   -- it is still inconsistent and confusing to (apparently) invoke a function in some places--- normally--- via 'foo(x)', yet in others--- pipily--- via 'foo()'. Especially if 'foo' already has a default value for its first argument.
> 
>   - I don't see the problem with a placeholder--- doesn't it remove all ambiguity? Sure there needs to be a standard unclashable name and people can argue about what that should be, but the following seems clear and flexible... to me, anyway:
>   
>   thing |>
>     foo( _PIPE_) |>           # standard
>     bah( arg1, _PIPE_) |>   # multi-arg function
>     _ANON_({ x <- sum( _PIPE_); _PIPE_/x + x/_PIPE_ })   # anon function
>    
> where '_PIPE_' is the ordained name of the placeholder, and '_ANON_' constructs-and-calls a function with single argument '_PIPE_'. There is just one rule (I think...): each pipe-stage must be a *call* involving the argument '_PIPE_'.

I believe there's no ambiguity if the placeholder is *only* allowed in  
the RHS of a pipe expression.  I think the ambiguity arises if you allow  
the same syntax to be used to generate anonymous functions.  We can't  
use _PIPE_ as the placeholder, because it's a legal name.  But we could  
use _.  Then

   x |> (_ + 1) + mean(_)

could expand unambiguously to

   (function(_) (_  + 1) + mean(_))(x)

but

   (_ + 1) + mean(_)

shouldn't be taken to be an anonymous function declaration, otherwise  
things like

   mean(_ |> _)

do become ambiguous:  is the second placeholder the argument to the anon  
function, or is it the placeholder for the embedded pipe?

However, implementing this makes the parser pretty ugly:  its handling  
of _ depends on the outer context.  I now agree that leaving out  
placeholder syntax was the right decision.


> 
> 
>   - The proposed anonymous-function syntax looks quite ugly to me, diminishing readability and inviting errors. The new pipe symbol |> already looks scarily like quantum mechanics; adding \( just puts fishbones into the symbolic soup.
> 
>   - IMO it's not worth going too far to try to lure magritter-etc fans to swap to the new; my experience is that many people keep using older inferior R syntax for years after better replacements become available (even if they are aware of replacements), for various reasons. Just provide a good framework, and let nature take its course.
>   
>   - Disclaimer: personally I'm not much of a pipehead anyway, so maybe I'm not the audience. But if I was to consider piping, I wouldn't be very tempted by the current proposal. OTOH, I might even be tempted to write--- and use!--- my own version of '%|>%' as above (maybe someone already has). And if R did it for me, that'd be great :)

Yours would suffer one of the same problems as magrittr's:  it has the  
wrong operator precedence.  The current precedence ordering (from  
?Syntax) is, from highest to lowest:


:: :::	access variables in a namespace
$ @	component / slot extraction
[ [[	indexing
^	exponentiation (right to left)
- +	unary minus and plus
:	sequence operator
%any%	special operators (including %% and %/%)
* /	multiply, divide
+ -	(binary) add, subtract
< > <= >= == !=	ordering and comparison
!	negation
& &&	and
| ||	or
~	as in formulae
-> ->>	rightwards assignment
<- <<-	assignment (right to left)
=	assignment (right to left)
?	help (unary and binary)


The %>% operator has higher precedence than the arithmetic operators, so

x*y %>% f()

is equivalent to x*f(y), not

f(x*y)

as it should "obviously" be.  I believe the new |> operator falls  
between "| ||" and "~", so

x || y |> f()

is the same as f(x || y), and

x ~ y |> f()

is x ~ f(y).   There could be arguments about where the new one appears  
(and there probably have been), but *clearly* magrittr's precedence is  
wrong, and yours would be too, because they are both fixed at the quite  
high precedence given to %any%.

Duncan Murdoch

>   
> [*] Definition of _ANON_ could be something like this--- almost certainly won't work as-is, this is just to point out that it could be done in standard R.
> 
> `_ANON_` <- function( expr) {
>    #1. Construct a function with arg '_PIPE_' and body 'expr'
>    #2. Construct a call() to that function
>    #3. Do the call
> 
>    f <- function( `_PIPE_`) NULL
>    body( f) <- expr
>    environment( f) <- parent.frame() # or something... yes these details are almost certainly wrong
>    expr2 <- substitute( f( `_PIPE_`)) # or something...
>    eval.parent( expr2) # or something...
> }
> 
> cheers
> Mark
> 
> Mark Bravington
> CSIRO Marine Lab
> Hobart
> Australia
> 
> 
> ________________________________________
> From: R-devel <r-devel-bounces using r-project.org> on behalf of Gabor Grothendieck <ggrothendieck using gmail.com>
> Sent: Monday, 7 December 2020 10:21
> To: Gabriel Becker
> Cc: r-devel using r-project.org
> Subject: Re: [Rd] New pipe operator
> 
> I understand very well that it is implemented at the syntax level;
> however, in any case the implementation is irrelevant to the principles.
> 
> Here a similar example to the one I gave before but this time written out:
> 
> This works:
> 
>    3 |> function(x) x + 1
> 
> but this does not:
> 
>    foo <- function(x) x + 1
>    3 |> foo
> 
> so it breaks the principle of functions being first class objects.  foo and its
> definition are not interchangeable.  You have
> to write 3 |> foo() but don't have to write 3 |> (function(x) x + 1)().
> 
> This isn't just a matter of notation, i.e. foo vs foo(), but is a
> matter of breaking
> the way R works as a functional language with first class functions.
> 
> On Sun, Dec 6, 2020 at 4:06 PM Gabriel Becker <gabembecker using gmail.com> wrote:
>>
>> Hi Gabor,
>>
>> On Sun, Dec 6, 2020 at 12:52 PM Gabor Grothendieck <ggrothendieck using gmail.com> wrote:
>>>
>>> I think the real issue here is that functions are supposed to be
>>> first class objects in R
>>> or are supposed to be and |> would break that if if is possible
>>> to write function(x) x + 1 on the RHS but not foo (assuming foo
>>> was defined as that function).
>>>
>>> I don't think getting experience with using it can change that
>>> inconsistency which seems serious to me and needs to
>>> be addressed even if it complicates the implementation
>>> since it drives to the heart of what R is.
>>>
>>
>> With respect I think this is a misunderstanding of what is happening here.
>>
>> Functions are first class citizens. |> is, for all intents and purposes, a macro.
>>
>> LHS |> RHS(arg2=5)
>>
>> parses to
>>
>> RHS(LHS, arg2 = 5)
>>
>> There are no functions at the point in time when the pipe transformation happens, because no code has been evaluated. To know if a symbol is going to evaluate to a function requires evaluation which is a step entirely after the one where the |> pipe is implemented.
>>
>> Another way to think about it is that
>>
>> LHS |> RHS(arg2 = 5)
>>
>> is another way of writing RHS(LHS, arg2 = 5), NOT R code that is (or even can be) evaluated.
>>
>>
>> Now this is a subtle point that only really has implications in as much as it is not the case for magrittr pipes, but its relevant for discussions like this, I think.
>>
>> ~G
>>
>>> On Sat, Dec 5, 2020 at 1:08 PM Gabor Grothendieck
>>> <ggrothendieck using gmail.com> wrote:
>>>>
>>>> The construct utils::head  is not that common but bare functions are
>>>> very common and to make it harder to use the common case so that
>>>> the uncommon case is slightly easier is not desirable.
>>>>
>>>> Also it is trivial to write this which does work:
>>>>
>>>> mtcars %>% (utils::head)
>>>>
>>>> On Sat, Dec 5, 2020 at 11:59 AM Hugh Parsonage <hugh.parsonage using gmail.com> wrote:
>>>>>
>>>>> I'm surprised by the aversion to
>>>>>
>>>>> mtcars |> nrow
>>>>>
>>>>> over
>>>>>
>>>>> mtcars |> nrow()
>>>>>
>>>>> and I think the decision to disallow the former should be
>>>>> reconsidered.  The pipe operator is only going to be used when the rhs
>>>>> is a function, so there is no ambiguity with omitting the parentheses.
>>>>> If it's disallowed, it becomes inconsistent with other treatments like
>>>>> sapply(mtcars, typeof) where sapply(mtcars, typeof()) would just be
>>>>> noise.  I'm not sure why this decision was taken
>>>>>
>>>>> If the only issue is with the double (and triple) colon operator, then
>>>>> ideally `mtcars |> base::head` should resolve to `base::head(mtcars)`
>>>>> -- in other words, demote the precedence of |>
>>>>>
>>>>> Obviously (looking at the R-Syntax branch) this decision was
>>>>> considered, put into place, then dropped, but I can't see why
>>>>> precisely.
>>>>>
>>>>> Best,
>>>>>
>>>>>
>>>>> Hugh.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sat, 5 Dec 2020 at 04:07, Deepayan Sarkar <deepayan.sarkar using gmail.com> wrote:
>>>>>>
>>>>>> On Fri, Dec 4, 2020 at 7:35 PM Duncan Murdoch <murdoch.duncan using gmail.com> wrote:
>>>>>>>
>>>>>>> On 04/12/2020 8:13 a.m., Hiroaki Yutani wrote:
>>>>>>>>>    Error: function '::' not supported in RHS call of a pipe
>>>>>>>>
>>>>>>>> To me, this error looks much more friendly than magrittr's error.
>>>>>>>> Some of them got too used to specify functions without (). This
>>>>>>>> is OK until they use `::`, but when they need to use it, it takes
>>>>>>>> hours to figure out why
>>>>>>>>
>>>>>>>> mtcars %>% base::head
>>>>>>>> #> Error in .::base : unused argument (head)
>>>>>>>>
>>>>>>>> won't work but
>>>>>>>>
>>>>>>>> mtcars %>% head
>>>>>>>>
>>>>>>>> works. I think this is a too harsh lesson for ordinary R users to
>>>>>>>> learn `::` is a function. I've been wanting for magrittr to drop the
>>>>>>>> support for a function name without () to avoid this confusion,
>>>>>>>> so I would very much welcome the new pipe operator's behavior.
>>>>>>>> Thank you all the developers who implemented this!
>>>>>>>
>>>>>>> I agree, it's an improvement on the corresponding magrittr error.
>>>>>>>
>>>>>>> I think the semantics of not evaluating the RHS, but treating the pipe
>>>>>>> as purely syntactical is a good decision.
>>>>>>>
>>>>>>> I'm not sure I like the recommended way to pipe into a particular argument:
>>>>>>>
>>>>>>>     mtcars |> subset(cyl == 4) |> \(d) lm(mpg ~ disp, data = d)
>>>>>>>
>>>>>>> or
>>>>>>>
>>>>>>>     mtcars |> subset(cyl == 4) |> function(d) lm(mpg ~ disp, data = d)
>>>>>>>
>>>>>>> both of which are equivalent to
>>>>>>>
>>>>>>>     mtcars |> subset(cyl == 4) |> (function(d) lm(mpg ~ disp, data = d))()
>>>>>>>
>>>>>>> It's tempting to suggest it should allow something like
>>>>>>>
>>>>>>>     mtcars |> subset(cyl == 4) |> lm(mpg ~ disp, data = .)
>>>>>>
>>>>>> Which is really not that far off from
>>>>>>
>>>>>> mtcars |> subset(cyl == 4) |> \(.) lm(mpg ~ disp, data = .)
>>>>>>
>>>>>> once you get used to it.
>>>>>>
>>>>>> One consequence of the implementation is that it's not clear how
>>>>>> multiple occurrences of the placeholder would be interpreted. With
>>>>>> magrittr,
>>>>>>
>>>>>> sort(runif(10)) %>% ecdf(.)(.)
>>>>>> ## [1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
>>>>>>
>>>>>> This is probably what you would expect, if you expect it to work at all, and not
>>>>>>
>>>>>> ecdf(sort(runif(10)))(sort(runif(10)))
>>>>>>
>>>>>> There would be no such ambiguity with anonymous functions
>>>>>>
>>>>>> sort(runif(10)) |> \(.) ecdf(.)(.)
>>>>>>
>>>>>> -Deepayan
>>>>>>
>>>>>>> which would be expanded to something equivalent to the other versions:
>>>>>>> but that makes it quite a bit more complicated.  (Maybe _ or \. should
>>>>>>> be used instead of ., since those are not legal variable names.)
>>>>>>>
>>>>>>> I don't think there should be an attempt to copy magrittr's special
>>>>>>> casing of how . is used in determining whether to also include the
>>>>>>> previous value as first argument.
>>>>>>>
>>>>>>> Duncan Murdoch
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Hiroaki Yutani
>>>>>>>>
>>>>>>>> 2020年12月4日(金) 20:51 Duncan Murdoch <murdoch.duncan using gmail.com>:
>>>>>>>>>
>>>>>>>>> Just saw this on the R-devel news:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> R now provides a simple native pipe syntax ‘|>’ as well as a shorthand
>>>>>>>>> notation for creating functions, e.g. ‘\(x) x + 1’ is parsed as
>>>>>>>>> ‘function(x) x + 1’. The pipe implementation as a syntax transformation
>>>>>>>>> was motivated by suggestions from Jim Hester and Lionel Henry. These
>>>>>>>>> features are experimental and may change prior to release.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This is a good addition; by using "|>" instead of "%>%" there should be
>>>>>>>>> a chance to get operator precedence right.  That said, the ?Syntax help
>>>>>>>>> topic hasn't been updated, so I'm not sure where it fits in.
>>>>>>>>>
>>>>>>>>> There are some choices that take a little getting used to:
>>>>>>>>>
>>>>>>>>>    > mtcars |> head
>>>>>>>>> Error: The pipe operator requires a function call or an anonymous
>>>>>>>>> function expression as RHS
>>>>>>>>>
>>>>>>>>> (I need to say mtcars |> head() instead.)  This sometimes leads to error
>>>>>>>>> messages that are somewhat confusing:
>>>>>>>>>
>>>>>>>>>    > mtcars |> magrittr::debug_pipe |> head
>>>>>>>>> Error: function '::' not supported in RHS call of a pipe
>>>>>>>>>
>>>>>>>>> but
>>>>>>>>>
>>>>>>>>> mtcars |> magrittr::debug_pipe() |> head()
>>>>>>>>>
>>>>>>>>> works.
>>>>>>>>>
>>>>>>>>> Overall, I think this is a great addition, though it's going to be
>>>>>>>>> disruptive for a while.
>>>>>>>>>
>>>>>>>>> Duncan Murdoch
>>>>>>>>>
>>>>>>>>> ______________________________________________
>>>>>>>>> R-devel using r-project.org mailing list
>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>>>>
>>>>>>>> ______________________________________________
>>>>>>>> R-devel using r-project.org mailing list
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>>>>
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> R-devel using r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-devel using r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>
>>>>> ______________________________________________
>>>>> R-devel using r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>
>>>>
>>>>
>>>> --
>>>> Statistics & Software Consulting
>>>> GKX Group, GKX Associates Inc.
>>>> tel: 1-877-GKX-GROUP
>>>> email: ggrothendieck at gmail.com
>>>
>>>
>>>
>>> --
>>> Statistics & Software Consulting
>>> GKX Group, GKX Associates Inc.
>>> tel: 1-877-GKX-GROUP
>>> email: ggrothendieck at gmail.com
>>>
>>> ______________________________________________
>>> R-devel using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> 
> 
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
> 
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



More information about the R-devel mailing list