[Rd] should base R have a piping operator ?

Duncan Murdoch murdoch@dunc@n @end|ng |rom gm@||@com
Sun Oct 6 22:56:29 CEST 2019


On 05/10/2019 7:50 p.m., Gabriel Becker wrote:
> Hi all,
> 
> I think there's some nuance here that makes makes me agree partially with
> each "side".
> 
> The pipe is inarguably extremely popular. Many probably think of it as a
> core feature of R, along with the tidyverse that (as was pointed out)
> largely surrounds it and drives its popularity. Whether its a good or bad
> thing that they think that doesn't change the fact that by my estimation
> that Ant is correct that they do. BUT, I don't agree with him that that, by
> itself, is a reason to put it in base R in the form that it exists now. For
> the current form, there aren't really any major downsides that I see to
> having people just use the package version.
> 
> Sure it may be a little weird, but it doesn't ever really stop the
> people from using it or present a significant barrier. Another major point
> is that many (most?) base R functions are not necessarily tooled to be
> endomorphic, which in my personal opinion is *largely* the only place that
> the pipes are really compelling.
> 
> That was for pipes as the exist in package space, though. There is another
> way the pipe could go into base R that could not be done in package space
> and has the potential to mitigate some pretty serious downsides to the
> pipes relating to debugging, which would be to implement them in the parser.

Actually, that could be done in package space too:  just write a 
function to do the transformation.  That is, something like

    transformPipe( a %>% b %>% c )

could convert the original expression into one like yours below.  This 
could be done by a smart IDE like RStudio without the user typing anything.

A really strong argument for doing this in a package instead of Bison/C 
code in the parser is the help page ?magrittr::"%>%".  There are so many 
special cases there that it's certainly hard and possibly impossible for 
the parser to do the transformation:  I think some parts of the 
transformation depend on run-time values, not syntax.

Of course, a simpler operator like Antoine's would be easier, but that 
would break code that uses magrittr pipes, and I think those are the 
most commonly accepted ones.

So a workable plan would be for all the pipe authors to agree on syntax 
for transformPipe(), and then for IDE authors to support it.  R Core 
doesn't need to be involved at all unless they want to update Rgui or 
R.app or command line R.

Duncan Murdoch

> 
> If
> 
> iris %>% group_by(Species) %>% summarize(mean_sl = mean(Sepal.Length)) %>%
> filter(mean_sl > 5)
> 
> 
> were *parsed* as, for example, into
> 
> local({
>              . = group_by(iris, Species)
> 
>              ._tmp2 = summarize(., mean_sl = mean(Sepal.Length))
> 
>              filter(., mean_sl > 5)
>         })
> 
> 
> 
> 
> Then debuggiing (once you knew that) would be much easier but behavaior
> would be the same as it is now. There could even be some sort of
> step-through-pipe debugger at that point added as well for additional
> convenience.
> 
> There is some minor precedent for that type of transformative parsing:
> 
>> expr = parse(text = "5 -> x")
> 
>> expr
> 
> expression(5 -> x)
> 
>> expr[[1]]
> 
> x <- 5
> 
> 
> Though thats a much more minor transformation.
> 
> All of that said, I believe Jim Hester (cc'ed) suggested something along
> these lines at the RSummit a couple of years ago, and thus far R-core has
> not shown much appetite for changing things in the parser.
> 
> Without that changing, I'd have to say that my vote, for whatever its
> worth, comes down on the side of pipes being fine in packages. A summary of
> my reasoning being that it only makes sense for them to go into R itself if
> doing so fixes an issue that cna't be fixed with them in package space.
> 
> Best,
> ~G
> 
> 
> 
> On Sun, Oct 6, 2019 at 5:26 AM Ant F <antoine.fabri using gmail.com> wrote:
> 
>> Yes but this exageration precisely misses the point.
>>
>> Concerning your examples:
>>
>> * I love fread but I think it makes a lot of subjective choices that are
>> best associated with a package. I think it
>> changed a lot with time and can still change, and we have great developers
>> willing to maintain it and be reactive
>> regarding feature requests or bug reports
>>
>> *.group_by() adds a class that works only (or mostly) with tidyverse verbs,
>> that's very easy to dismiss it as an inclusion in base R.
>>
>> * summarize is an alternative to aggregate, that would be very confusing to
>> have both
>>
>> Now to be fair to your argument we could think of other functions such as
>> data.table::rleid() which I believe base R misses deeply,
>> and there is nothing wrong with packaged functions making their way to base
>> R.
>>
>> Maybe there's an existing list of criteria for inclusion, in base R but if
>> not I can make one up for the sake of this discussion :) :
>> * 1) the functionality should not already exist
>> * 2) the function should be general enough
>> * 3) the function should have a large amount of potential of users
>> * 4) the function should be robust, and not require extensive maintenance
>> * 5) the function should be stable, we shouldn't expect new features ever 2
>> months
>> * 6) the function should have an intuitive interface in the context of the
>> rest ot base R
>>
>> I guess 1 and 6 could be held against my proposal, because :
>> (1) everything can be done without pipes
>> (6) They are somewhat surprising (though with explicit dots not that much,
>> and not more surprising than say `bquote()`)
>>
>> In my opinion the + offset the -.
>>
>> I wouldn't advise taking magrittr's pipe (providing the license allows so)
>> for instance, because it makes a lot of design choices and has a complex
>> behavior, what I propose is 2 lines of code very unlikely to evolve or
>> require maintenance.
>>
>> Antoine
>>
>> PS: I just receive the digest once a day so If you don't "reply all" I can
>> only react later.
>>
>> Le sam. 5 oct. 2019 à 19:54, Hugh Marera <hugh.marera using gmail.com> a écrit :
>>
>>> I exaggerated the comparison for effect. However, it is not very
>> difficult
>>> to find functions in dplyr or data.table or indeed other packages that
>> one
>>> may wish to be in base R. Examples, for me, could include
>>> data.table::fread, dplyr::group_by & dplyr::summari[sZ]e combo, etc.
>> Also,
>>> the "popularity" of magrittr::`%>%` is mostly attributable to the
>> tidyverse
>>> (an advanced superset of R). Many R users don't even know that they are
>>> installing the magrittr package.
>>>
>>> On Sat, Oct 5, 2019 at 6:30 PM Iñaki Ucar <iucar using fedoraproject.org>
>> wrote:
>>>
>>>> On Sat, 5 Oct 2019 at 17:15, Hugh Marera <hugh.marera using gmail.com> wrote:
>>>>>
>>>>> How is your argument different to, say,  "Should dplyr or data.table
>> be
>>>>> part of base R as they are the most popular data science packages and
>>>> they
>>>>> are used by a large number of users?"
>>>>
>>>> Two packages with many features, dozens of functions and under heavy
>>>> development to fix bugs, add new features and improve performance, vs.
>>>> a single operator with a limited and well-defined functionality, and a
>>>> reference implementation that hasn't changed in years (but certainly
>>>> hackish in a way that probably could only be improved from R itself).
>>>>
>>>> Can't you really spot the difference?
>>>>
>>>> Iñaki
>>>>
>>>
>>
>>          [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



More information about the R-devel mailing list