[Rd] the pipe |> and line breaks in pipelines

Bill Dunlap w||||@mwdun|@p @end|ng |rom gm@||@com
Wed Dec 9 23:42:43 CET 2020


When I am debugging a function with code like
    x <- f1(x)
    x <- f2(x)
    result <- f3(x)
I will often slip a line like '.GlobalEnv$tmp1 <- x' between the first two
lines and '.GlobalEnv$tmp2 <- x' between the last two lines and look at the
intermediate results, 'tmp1' and 'tmp2' in the global environment, later to
see what is going on.

The equivalent expression using pipes is
    x |>
        f1() |>
        f2() \>
        f3() -> result
You can slip lines like 'print() \>' between the pipe parts because
print(x) returns x, but it is more tedious to add assignment lines.  One
could define a function like
   pipe_save <- function(x, name, envir=.GlobalEnv) {
       envir[[name]] <- x
        x
   }
and then puts lines like 'pipe_save("tmp1") |>' into the pipe sequence to
save intermediate results.

A function like
    pipe_eval <- function(x, expr) {
       eval(substitute(expr), list(x=x))
        x
   }
would make it easy to call plot() or summary(), etc., on the piped data
with lines like
   'pipe_eval(print(summary(x)) |>'
inserted into the pipe sequence.

E.g.,

> 1/(1:10) |>
+    pipe_eval(print(summary(x))) |>
+    range() |>
+    pipe_eval(print(x)) |>
+    sum()
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
 0.1000  0.1295  0.1833  0.2929  0.3125  1.0000
[1] 0.1 1.0
[1] 1.1

You could even add if(isTRUE(getOption("debug"))) before the eval() or
assignment to make these do nothing to make it easy to turn debugging on
and off with options(debug=TRUE/FALSE).

-Bill


On Wed, Dec 9, 2020 at 1:58 PM Timothy Goodman <timsgoodman using gmail.com>
wrote:
>
> On Wed, Dec 9, 2020 at 1:03 PM Duncan Murdoch <murdoch.duncan using gmail.com>
> wrote:  Then I could run any number of lines with pipes at the
>
> > > start and no special character at the end, and have it treated as a
> > > single pipeline.  I suppose that'd need to be a feature offered by the
> > > environment (RStudio's RNotebooks in my case).  I could wrap my
> > > pipelines in parentheses (to make the "pipes at start of line" syntax
> > > valid R code), and then could use the hypothetical "submit selected
code
> > > ignoring line-breaks" feature when running just the first part of the
> > > pipeline -- i.e., selecting full lines, but starting after the opening
> > > paren so as not to need to insert a closing paren.
> >
> > I think I don't understand your workflow enough to comment on this.
> >
> > Duncan
> >
> >
> >
> What I mean is, I could add parentheses as suggested to let me put the
> pipes at the start of the line, like this:
>
>     (                                  # Line 1
>         my_data_frame                  # Line 2
>         |> filter(some_condition)      # Line 3
>         |> group_by(some_column)       # Line 4
>         |> summarize(some_functions)   # Line 5
>     )                                  # Line 6
>
> If this gives me an unexpected result, I might want to re-run just up
> through line 3 and check the output, to see if something is wrong with the
> "filter" (e.g., my condition matched less data than expected).  Ideally, I
> could do this without changing the code, by just selecting lines 2 and 3
> and pressing Ctrl+Enter (my environment's shortcut for "run selected
> code").  But it wouldn't work, because without including the parentheses
> these lines would be treated as two separate expressions, the second of
> which is invalid since it starts with a pipe.  Alternatively, I could
> include line 1 in my selection (along with lines 2 and 3), but it wouldn't
> work without having to type a new closing parenthesis after line 3, and
> then delete it afterwards.  Or, I could select and comment out lines 4 and
> 5, and then select and run all 6 lines.  But none of those are as
> convenient as just being able to select and run lines 2 and 3 (which is
> what I'm used to being able to do in several other languages which support
> pipelines).  And though it may seem a minor annoyance, when I'm working a
> lot with dplyr code I find myself wanting to do something like this many
> times per day.
>
> What *would* work well would be if I could write the code as above, but
> then when I want to select and re-run just lines 2 and 3, I would use some
> keyboard shortcut that meant "pass this code to the parser as a single
> line, with line breaks (and comments) removed".  Then it would be run like
>     my_data_frame |> filter(some_condition)
> instead of producing an error.  That'd require the environment I'm using
--
> RStudio -- to support this feature, but wouldn't require any change to how
> R is parsed.  From the replies here, I'm coming around to thinking that'd
> be the better option.
>
> - Tim
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

	[[alternative HTML version deleted]]



More information about the R-devel mailing list