[R] Pipe operator

Rui Barradas ru|pb@rr@d@@ @end|ng |rom @@po@pt
Tue Jan 3 20:14:49 CET 2023


Às 17:35 de 03/01/2023, Greg Snow escreveu:
> To expand a little on Christopher's answer.
> 
> The short answer is that having the different syntaxes can lead to
> more readable code (when used properly).
> 
> Note that there are now 2 different (but somewhat similar) pipes
> available in R (there could be more in some package(s) that I don't
> know about, but will just talk about the main 2).
> 
> The %>% pipe comes from the magrittr package, but many other packages
> now import that package.  But you need to load the magrittr package,
> either directly or indirectly, before you can use that pipe.  The
> magrittr pipe is a function call, so there is small increase in time
> and memory for using it, but it is a small fraction of a second and a
> few bytes of memory, so you probably will not notice the increased
> usage.
> 
> The core R language now has a built in pipe |> which is handled by the
> parser, so no extra function calls and you do not need to load any
> extra packages (though you need a somewhat recent version of R, within
> the last year or so).
> 
> The built-in |> pipe is a little pickier, you need to include the
> parentheses in a function call, e.g. 1:10 |> mean() where the magrittr
> pipe can work with that call or the function without parentheses, e.g.
> 1:10 %>% mean or 1:10 %>% mean(), this makes %>% a little easier to
> work with anonymous functions.  If the previous return needs to be
> passed to an argument other than the first, then %>% uses "." and |>
> uses "_".
> 
> The magrittr package has additional versions of the pipe and some
> functions that wrap around common operators to make it easier to use
> them with pipes, so there are still advantages to loading that package
> if any of those are helpful.
> 
> For a simple case like your example, the pipe probably does not help
> with readability much, but as we string more function calls together.
> For example, here are 3 ways to compute the geometric mean of the data
> in a vector "x":
> 
> exp(mean(log(x)))
> 
> logx <- log(x)
> mlx <- mean(logx)
> exp(mtx)
> 
> x |>
>     log() |>
>     mean() |>
>     exp()
> 
> These all do the same thing, but the first option is read from the
> middle outward (which can be tricky) and is even more complicated if
> you use additional arguments to any of the functions.
> The second option reads top down, but requires creating intermediate
> variables.  The last reads similar to the second, but without the
> extra variables.  Spreading the series of function calls across
> multiple rows makes it easier to read and easily lets you insert a
> line like `print() |>` for debugging or checking intermediate results,
> and single lines can easily be commented out to skip that step.
> 
> I have found myself using code like the following to compute a table,
> print it, and compute the proportions all in one step:
> 
> table(f, g) |>
>    print() |>
>    prop.table()
> 
> The pipes also work very well with the tidyverse, or even the tidy
> data ideas without those packages where we use a single function for
> each change, e.g. start with a data frame, select a subset of the
> columns, filter to a subset of the rows, mutate a column, join to
> another data frame, then pass the final result to a modeling function
> like `lm` (and then pass that result to a summary function).  This is
> nicely readable when each step is its own line.
> 
> On Tue, Jan 3, 2023 at 9:49 AM Sorkin, John <jsorkin using som.umaryland.edu> wrote:
>>
>> I am trying to understand the reason for existence of the pipe operator, %>%, and when one should use it. It is my understanding that the operator sends the file to the left of the operator to the function immediately to the right of the operator:
>>
>> c(1:10) %>% mean results in a value of 5.5 which is exactly the same as the result one obtains using the mean function directly, viz. mean(c(1:10)). What is the reason for having two syntactically different but semantically identical ways to call a function? Is one more efficient than the other? Does one use less memory than the other?
>>
>> P.S. Please forgive what might seem to be a question with an obvious answer. I am a programmer dinosaur. I have been programming for more than 50 years. When I started programming in the 1960s the only pipe one spoke about was a bong.
>>
>> John
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 

Hello,

Not a long time ago, there was (very) relevant post to r-devel [1] by 
Paul Murrell linking to a YouTube video [2].

[1] https://stat.ethz.ch/pipermail/r-devel/2022-September/081959.html
[2] https://youtu.be/IMpXB30MP48

Hope this helps,

Rui Barradas



More information about the R-help mailing list