[R] Pipe operator

Tue Jan 3 20:00:46 CET 2023

Working off Avi's example - would:

  x |> cos() |> max(pi/4) |> round(3) |> assign("x", value = _)

...be even more intuitive to read? Or are there hidden problems with that?

Cheers,
Boris

> On 2023-01-03, at 12:40, avi.e.gross using gmail.com wrote:
> 
> John,
> 
> The topic has indeed been discussed here endlessly but new people still
> stumble upon it.
> 
> Until recently, the formal R language did not have a built-in pipe
> functionality. It was widely used through an assortment of packages and
> there are quite a few variations on the theme including different
> implementations.
> 
> Most existing code does use the operator %>% but there is now a built-in |>
> operator that is generally faster but is not as easy to use in a few cases.
> 
> Please forget the use of the word FILE here. Pipes are a form of syntactic
> sugar that generally is about the FIRST argument to a function. They are NOT
> meant to be used just for the trivial case you mention where indeed there is
> an easy way to do things. Yes, they work in such situations. But consider a
> deeply nested expression like this:
> 
> Result <- round(max(cos(x), 3.14159/4), 3)
> 
> There are MANY deeper nested expressions like this commonly used. The above
> can be written linearly as in
> 
> Temp1 <- cos(x)
> Temp2 <- max(Temp1, 3.14159/4)
> Result <- round(Temp2, 3)
> 
> Translation, for some variable x, calculate the cosine and take the maximum
> value of it as compared to pi/4 and round the result to three decimal
> places. Not an uncommon kind of thing to do and sometimes you can nest such
> things many layers deep and get hopelessly confused if not done somewhat
> linearly.
> 
> What pipes allow is to write this closer to the second way while not seeing
> or keeping any temporary variables around. The goal is to replace the FIRST
> argument to a function with whatever resulted as the value of the previous
> expression. That is often a vector or data.frame or list or any kind of
> object but can also be fairly complex as in a list of lists of matrices.
> 
> So you can still start with cos(x) OR you can write this where the x is
> removed from within and leaves cos() empty:
> 
> x %>% cos
> or
> x |> cos()
> 
> In the previous version of pipes the parentheses after cos() are optional if
> there are no additional arguments but the new pipe requires them.
> 
> So continuing the above, using multiple lines, the pipe looks like:
> 
> Result <-
>  x %>%
>  cos() %>%
>  max(3.14159/4) %>%
>  round(3)
> 
> This gives the same result but is arguably easier for some to read and
> follow. Nobody forces you to use it and for simple cases, most people don't.
> 
> There is a grouping of packages called the tidyverse that makes heavy use of
> pipes routine as they made most or all their functions such that the first
> argument is the one normally piped to and it can be very handy to write code
> that says, read in your data into a variable (a data.frame or tibble often)
> and PIPE IT to a function that renames some columns and PIPE the resulting
> modified object to a function that retains only selected rows and pipe that
> to a function that drops some of the columns and pipe that to a function
> that groups the items or sorts them and pipe that to a function that does a
> join with another object or generates a report or so many other things.
> 
> So the real answer is that piping is another WAY of doing things from a
> programmers perspective. Underneath it all, it is mostly syntactic sugar and
> the interpreter rearranges your code and performs the steps in what seems
> like a different order at times. Generally, you do not need to care.
> 
> 
> 
> -----Original Message-----
> From: R-help <r-help-bounces using r-project.org> On Behalf Of Sorkin, John
> Sent: Tuesday, January 3, 2023 11:49 AM
> To: 'R-help Mailing List' <r-help using r-project.org>
> Subject: [R] Pipe operator
> 
> I am trying to understand the reason for existence of the pipe operator,
> %>%, and when one should use it. It is my understanding that the operator
> sends the file to the left of the operator to the function immediately to
> the right of the operator:
> 
> c(1:10) %>% mean results in a value of 5.5 which is exactly the same as the
> result one obtains using the mean function directly, viz. mean(c(1:10)).
> What is the reason for having two syntactically different but semantically
> identical ways to call a function? Is one more efficient than the other?
> Does one use less memory than the other? 
> 
> P.S. Please forgive what might seem to be a question with an obvious answer.
> I am a programmer dinosaur. I have been programming for more than 50 years.
> When I started programming in the 1960s the only pipe one spoke about was a
> bong.  
> 
> John
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Boris Steipe MD, PhD

Professor em.
Department of Biochemistry 
Temerty Faculty of Medicine
University of Toronto