[Rd] should base R have a piping operator ?

Ant F @nto|ne@|@br| @end|ng |rom gm@||@com
Mon Oct 7 00:31:18 CEST 2019


As a matter of fact I played a few days ago with this idea of transforming
the pipe chain to a sequence of calls such as the one Gabriel proposed.

My proposed debugging method was to use a debugging pipe

calling iris %>% head %B>% dim %>% length will open place you right at the
browser call below :

#> Called from: (function (.) #> {#>     on.exit(rm(.))#>     . <-
head(.)#>     browser()#>     . <- dim(.)#>     . <- length(.)#>
.#> })(iris)

https://github.com/moodymudskipper/pipe/blob/master/README.md

Regarding breaking code, it would only if the pipe if named the same.

To be clear, I like the fact that magrittr exists as an external package
and that it can evolve with the thought and input of the tidyverse crew and
I wouldn't want a base pipe to replace it.
I think package developers would code and document using the base pipe
(unless they have a strong preference for a packaged pipe), and that users
would use interactively the pipe they prefer, which is usually magrittr's
pipe among current choices.

Thanks all for the good points,

Antoine



Le dim. 6 oct. 2019 à 22:56, Duncan Murdoch <murdoch.duncan using gmail.com> a
écrit :

> On 05/10/2019 7:50 p.m., Gabriel Becker wrote:
> > Hi all,
> >
> > I think there's some nuance here that makes makes me agree partially with
> > each "side".
> >
> > The pipe is inarguably extremely popular. Many probably think of it as a
> > core feature of R, along with the tidyverse that (as was pointed out)
> > largely surrounds it and drives its popularity. Whether its a good or bad
> > thing that they think that doesn't change the fact that by my estimation
> > that Ant is correct that they do. BUT, I don't agree with him that that,
> by
> > itself, is a reason to put it in base R in the form that it exists now.
> For
> > the current form, there aren't really any major downsides that I see to
> > having people just use the package version.
> >
> > Sure it may be a little weird, but it doesn't ever really stop the
> > people from using it or present a significant barrier. Another major
> point
> > is that many (most?) base R functions are not necessarily tooled to be
> > endomorphic, which in my personal opinion is *largely* the only place
> that
> > the pipes are really compelling.
> >
> > That was for pipes as the exist in package space, though. There is
> another
> > way the pipe could go into base R that could not be done in package space
> > and has the potential to mitigate some pretty serious downsides to the
> > pipes relating to debugging, which would be to implement them in the
> parser.
>
> Actually, that could be done in package space too:  just write a
> function to do the transformation.  That is, something like
>
>     transformPipe( a %>% b %>% c )
>
> could convert the original expression into one like yours below.  This
> could be done by a smart IDE like RStudio without the user typing anything.
>
> A really strong argument for doing this in a package instead of Bison/C
> code in the parser is the help page ?magrittr::"%>%".  There are so many
> special cases there that it's certainly hard and possibly impossible for
> the parser to do the transformation:  I think some parts of the
> transformation depend on run-time values, not syntax.
>
> Of course, a simpler operator like Antoine's would be easier, but that
> would break code that uses magrittr pipes, and I think those are the
> most commonly accepted ones.
>
> So a workable plan would be for all the pipe authors to agree on syntax
> for transformPipe(), and then for IDE authors to support it.  R Core
> doesn't need to be involved at all unless they want to update Rgui or
> R.app or command line R.
>
> Duncan Murdoch
>
> >
> > If
> >
> > iris %>% group_by(Species) %>% summarize(mean_sl = mean(Sepal.Length))
> %>%
> > filter(mean_sl > 5)
> >
> >
> > were *parsed* as, for example, into
> >
> > local({
> >              . = group_by(iris, Species)
> >
> >              ._tmp2 = summarize(., mean_sl = mean(Sepal.Length))
> >
> >              filter(., mean_sl > 5)
> >         })
> >
> >
> >
> >
> > Then debuggiing (once you knew that) would be much easier but behavaior
> > would be the same as it is now. There could even be some sort of
> > step-through-pipe debugger at that point added as well for additional
> > convenience.
> >
> > There is some minor precedent for that type of transformative parsing:
> >
> >> expr = parse(text = "5 -> x")
> >
> >> expr
> >
> > expression(5 -> x)
> >
> >> expr[[1]]
> >
> > x <- 5
> >
> >
> > Though thats a much more minor transformation.
> >
> > All of that said, I believe Jim Hester (cc'ed) suggested something along
> > these lines at the RSummit a couple of years ago, and thus far R-core has
> > not shown much appetite for changing things in the parser.
> >
> > Without that changing, I'd have to say that my vote, for whatever its
> > worth, comes down on the side of pipes being fine in packages. A summary
> of
> > my reasoning being that it only makes sense for them to go into R itself
> if
> > doing so fixes an issue that cna't be fixed with them in package space.
> >
> > Best,
> > ~G
> >
> >
> >
> > On Sun, Oct 6, 2019 at 5:26 AM Ant F <antoine.fabri using gmail.com> wrote:
> >
> >> Yes but this exageration precisely misses the point.
> >>
> >> Concerning your examples:
> >>
> >> * I love fread but I think it makes a lot of subjective choices that are
> >> best associated with a package. I think it
> >> changed a lot with time and can still change, and we have great
> developers
> >> willing to maintain it and be reactive
> >> regarding feature requests or bug reports
> >>
> >> *.group_by() adds a class that works only (or mostly) with tidyverse
> verbs,
> >> that's very easy to dismiss it as an inclusion in base R.
> >>
> >> * summarize is an alternative to aggregate, that would be very
> confusing to
> >> have both
> >>
> >> Now to be fair to your argument we could think of other functions such
> as
> >> data.table::rleid() which I believe base R misses deeply,
> >> and there is nothing wrong with packaged functions making their way to
> base
> >> R.
> >>
> >> Maybe there's an existing list of criteria for inclusion, in base R but
> if
> >> not I can make one up for the sake of this discussion :) :
> >> * 1) the functionality should not already exist
> >> * 2) the function should be general enough
> >> * 3) the function should have a large amount of potential of users
> >> * 4) the function should be robust, and not require extensive
> maintenance
> >> * 5) the function should be stable, we shouldn't expect new features
> ever 2
> >> months
> >> * 6) the function should have an intuitive interface in the context of
> the
> >> rest ot base R
> >>
> >> I guess 1 and 6 could be held against my proposal, because :
> >> (1) everything can be done without pipes
> >> (6) They are somewhat surprising (though with explicit dots not that
> much,
> >> and not more surprising than say `bquote()`)
> >>
> >> In my opinion the + offset the -.
> >>
> >> I wouldn't advise taking magrittr's pipe (providing the license allows
> so)
> >> for instance, because it makes a lot of design choices and has a complex
> >> behavior, what I propose is 2 lines of code very unlikely to evolve or
> >> require maintenance.
> >>
> >> Antoine
> >>
> >> PS: I just receive the digest once a day so If you don't "reply all" I
> can
> >> only react later.
> >>
> >> Le sam. 5 oct. 2019 à 19:54, Hugh Marera <hugh.marera using gmail.com> a
> écrit :
> >>
> >>> I exaggerated the comparison for effect. However, it is not very
> >> difficult
> >>> to find functions in dplyr or data.table or indeed other packages that
> >> one
> >>> may wish to be in base R. Examples, for me, could include
> >>> data.table::fread, dplyr::group_by & dplyr::summari[sZ]e combo, etc.
> >> Also,
> >>> the "popularity" of magrittr::`%>%` is mostly attributable to the
> >> tidyverse
> >>> (an advanced superset of R). Many R users don't even know that they are
> >>> installing the magrittr package.
> >>>
> >>> On Sat, Oct 5, 2019 at 6:30 PM Iñaki Ucar <iucar using fedoraproject.org>
> >> wrote:
> >>>
> >>>> On Sat, 5 Oct 2019 at 17:15, Hugh Marera <hugh.marera using gmail.com>
> wrote:
> >>>>>
> >>>>> How is your argument different to, say,  "Should dplyr or data.table
> >> be
> >>>>> part of base R as they are the most popular data science packages and
> >>>> they
> >>>>> are used by a large number of users?"
> >>>>
> >>>> Two packages with many features, dozens of functions and under heavy
> >>>> development to fix bugs, add new features and improve performance, vs.
> >>>> a single operator with a limited and well-defined functionality, and a
> >>>> reference implementation that hasn't changed in years (but certainly
> >>>> hackish in a way that probably could only be improved from R itself).
> >>>>
> >>>> Can't you really spot the difference?
> >>>>
> >>>> Iñaki
> >>>>
> >>>
> >>
> >>          [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> R-devel using r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list