[R] Pipe operator

Rui Barradas ru|pb@rr@d@@ @end|ng |rom @@po@pt
Tue Jan 3 20:22:36 CET 2023


Às 19:14 de 03/01/2023, Rui Barradas escreveu:
> Às 17:35 de 03/01/2023, Greg Snow escreveu:
>> To expand a little on Christopher's answer.
>>
>> The short answer is that having the different syntaxes can lead to
>> more readable code (when used properly).
>>
>> Note that there are now 2 different (but somewhat similar) pipes
>> available in R (there could be more in some package(s) that I don't
>> know about, but will just talk about the main 2).
>>
>> The %>% pipe comes from the magrittr package, but many other packages
>> now import that package.  But you need to load the magrittr package,
>> either directly or indirectly, before you can use that pipe.  The
>> magrittr pipe is a function call, so there is small increase in time
>> and memory for using it, but it is a small fraction of a second and a
>> few bytes of memory, so you probably will not notice the increased
>> usage.
>>
>> The core R language now has a built in pipe |> which is handled by the
>> parser, so no extra function calls and you do not need to load any
>> extra packages (though you need a somewhat recent version of R, within
>> the last year or so).
>>
>> The built-in |> pipe is a little pickier, you need to include the
>> parentheses in a function call, e.g. 1:10 |> mean() where the magrittr
>> pipe can work with that call or the function without parentheses, e.g.
>> 1:10 %>% mean or 1:10 %>% mean(), this makes %>% a little easier to
>> work with anonymous functions.  If the previous return needs to be
>> passed to an argument other than the first, then %>% uses "." and |>
>> uses "_".
>>
>> The magrittr package has additional versions of the pipe and some
>> functions that wrap around common operators to make it easier to use
>> them with pipes, so there are still advantages to loading that package
>> if any of those are helpful.
>>
>> For a simple case like your example, the pipe probably does not help
>> with readability much, but as we string more function calls together.
>> For example, here are 3 ways to compute the geometric mean of the data
>> in a vector "x":
>>
>> exp(mean(log(x)))
>>
>> logx <- log(x)
>> mlx <- mean(logx)
>> exp(mtx)
>>
>> x |>
>>     log() |>
>>     mean() |>
>>     exp()
>>
>> These all do the same thing, but the first option is read from the
>> middle outward (which can be tricky) and is even more complicated if
>> you use additional arguments to any of the functions.
>> The second option reads top down, but requires creating intermediate
>> variables.  The last reads similar to the second, but without the
>> extra variables.  Spreading the series of function calls across
>> multiple rows makes it easier to read and easily lets you insert a
>> line like `print() |>` for debugging or checking intermediate results,
>> and single lines can easily be commented out to skip that step.
>>
>> I have found myself using code like the following to compute a table,
>> print it, and compute the proportions all in one step:
>>
>> table(f, g) |>
>>    print() |>
>>    prop.table()
>>
>> The pipes also work very well with the tidyverse, or even the tidy
>> data ideas without those packages where we use a single function for
>> each change, e.g. start with a data frame, select a subset of the
>> columns, filter to a subset of the rows, mutate a column, join to
>> another data frame, then pass the final result to a modeling function
>> like `lm` (and then pass that result to a summary function).  This is
>> nicely readable when each step is its own line.
>>
>> On Tue, Jan 3, 2023 at 9:49 AM Sorkin, John 
>> <jsorkin using som.umaryland.edu> wrote:
>>>
>>> I am trying to understand the reason for existence of the pipe 
>>> operator, %>%, and when one should use it. It is my understanding 
>>> that the operator sends the file to the left of the operator to the 
>>> function immediately to the right of the operator:
>>>
>>> c(1:10) %>% mean results in a value of 5.5 which is exactly the same 
>>> as the result one obtains using the mean function directly, viz. 
>>> mean(c(1:10)). What is the reason for having two syntactically 
>>> different but semantically identical ways to call a function? Is one 
>>> more efficient than the other? Does one use less memory than the other?
>>>
>>> P.S. Please forgive what might seem to be a question with an obvious 
>>> answer. I am a programmer dinosaur. I have been programming for more 
>>> than 50 years. When I started programming in the 1960s the only pipe 
>>> one spoke about was a bong.
>>>
>>> John
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide 
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
> 
> Hello,
> 
> Not a long time ago, there was (very) relevant post to r-devel [1] by 
> Paul Murrell linking to a YouTube video [2].
> 
> [1] https://stat.ethz.ch/pipermail/r-devel/2022-September/081959.html
> [2] https://youtu.be/IMpXB30MP48
> 
> Hope this helps,
> 
> Rui Barradas
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Hello,

Sorry, I forgot the link to the beginning of that r-devel thread.

https://stat.ethz.ch/pipermail/r-devel/2022-April/081636.html

Rui Barradas



More information about the R-help mailing list