[Rd] New pipe operator
@v|gro@@ @end|ng |rom ver|zon@net
Sun Dec 6 16:29:49 CET 2020
Naming is another whole topic.
I have seen suggestions that the current pipeline symbol used be phrased as THEN so
data %>% f1 %>% f2()
would be said as something like:
take data then apply f1 then f2
or some variants.
There are words other than pipe or pipeline that might also work such as "assembly line" or "conveyor belt" that might fit some kinds of pipelining better than others. My original exposure to UNIX in the early 80's used a pipeline of multiple processes whose standard input and/or standard output (and sometimes also standard error) were redirected to an anonymous "pipe" device that buffered whatever (usually) text that was thrown at it and the processes reading and writing from it were paused and restarted as needed when data was ready. Problems often could be decomposed into multiple parts that had a solution using some program and it was not unusual to do something like:
cat *.c | grep -v ... | grep ... | sed ... | cut ... >output
Of course something like the above was often rewritten to be done within a single awk script or perl or whatever. You could view the above though from the perspective of "data" in some form, often text, being passed from one function(ality) to another and changing a bit each step of the way. A very common use of this form of pipeline was used to deal with embedded text in a different language in typsetting:
tbl filename | eqn | pic | troff | ...
The above would open a file, pass through all lines except those between markers that specified a table starting and ending. Those lines would be processed and transformed into the troff language equivalent. The old plus new lines now went to eqn which found and transformed equations similarly then to pic which transformed instructions it knew to image descriptions in troff and finally troff processed the whole mess and then off to the printer.
Clearly the above can be seen as a data pipeline using full processes as nodes.
The way R is using the pipeline may just use functions but you can imagine it as having similarities and differences. Current implementations may be linear with lazy evaluation and with every part running to completion before the next part starts. Every "object" is fully made, then used, then often removed as a temporary object. There is no buffering. But in principle, you can make UNIX-like pipelines using parallelism within a process too.
Would there be scenarios where phrases like "assembly line" or "conveyor belt" make sense to describe the method properly? The word pipe suggests a linearity to some whereas conveyor belts these days also can be used to selectively shunt things one way or another as in assembling all parts of your order from different parts of a warehouse and arranging they all end up in the same delivery area. Making applications do that dynamically may have other names. Think flowchart!
Time to go do something useful.
From: R-devel <r-devel-bounces using r-project.org> On Behalf Of Hiroaki Yutani
Sent: Saturday, December 5, 2020 10:29 PM
To: Abby Spurdle <spurdle.a using gmail.com>
Cc: r-devel <r-devel using r-project.org>
Subject: Re: [Rd] New pipe operator
It is common practice to call |> as pipe (or pipeline operator) among many languages including ones that recently introduced it as an experimental feature.
Pipeline is a
common feature for functional programming, not just for "data pipeline."
(This blog post about the history of pipe operator might be
I agree this is a bit confusing for those who are familiar with other "pipe" concepts, but there's no other appropriate term to call |>.
2020年12月6日(日) 12:22 Gregory Warnes <greg using warnes.net>:
> If we’re being mathematically pedantic, the “pipe” operator is
> actually function composition.
> That being said, pipes are a simple and well-known idiom. While being less
> than mathematically exact, it seems a reasonable label for the (very
> useful) behavior.
> On Sat, Dec 5, 2020 at 9:43 PM Abby Spurdle <spurdle.a using gmail.com> wrote:
> > > This is a good addition
> > I can't understand why so many people are calling this a "pipe".
> > Pipes connect processes, via their I/O streams.
> > Arguably, a more general interpretation would include sockets and files.
> > https://en.wikipedia.org/wiki/Pipeline_(Unix)
> > https://en.wikipedia.org/wiki/Named_pipe
> > https://en.wikipedia.org/wiki/Anonymous_pipe
> > As far as I can tell, the magrittr-like operators are functions (not
> > pipes), with nonstandard syntax.
> > This is not consistent with R's original design philosophy, building
> > on C, Lisp and S, along with lots of *important* math and stats.
> > It's possible that some parties are interested in creating a kind of
> > "data pipeline".
> > I'm interested in this myself, and I think we could discuss this more.
> > But I'm not convinced the magrittr-like operators help to achieve
> > this goal.
> > Which, in my opinion, would require one to model programs as
> > directed graphs, along with some degree of asynchronous input.
> > Presumably, these operators will be added to R anyway, and (almost)
> > no one will listen to me.
> > So, I would like to make one suggestion:
> > Is it possible for these operators to *not* be named:
> > The R Pipe
> > The S Pipe
> > Or anything with a similar meaning.
> > Maybe tidy pipe, or something else that links it to its proponents?
> > ______________________________________________
> > R-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> "Whereas true religion and good morals are the only solid foundations
> of public liberty and happiness . . . it is hereby earnestly
> recommended to the several States to take the most effectual measures
> for the encouragement thereof." Continental Congress, 1778
> [[alternative HTML version deleted]]
> R-devel using r-project.org mailing list
R-devel using r-project.org mailing list
Scanned by McAfee and confirmed virus-free.
Find out more here: https://bit.ly/2zCJMrO
More information about the R-devel