[R] Functional Programming Problem Using purr and R's data.table shift function

Dénes Tóth toth@dene@ @end|ng |rom kogentum@hu
Tue Jan 3 11:48:54 CET 2023


Hi Michael,

R returns the result of the last evaluated expression by default:
```
add_2 <- function(x) {
   x + 2L
}
```

is the same as and preferred over
```
add_2_return <- function(x) {
   out <- x + 2L
   return(out)
}
```

In the idiomatic use of R, one uses explicit `return` when one wants to 
break the control flow, e.g.:
```
add_2_if_number <- function(x) {
   ## early return if x is not numeric
   if (!is.numeric(x)) {
     return(x)
   }
   ## process otherwise (usually more complicated steps)
   ## note: this part will not be reached for non-numeric x
   x + 2L
}
```

So yes, you should drop the last "%>% `[`" altogether as `[.data.table` 
already returns the whole (modified) data.table when `:=` is used.

Side note:: If you use >=R4.1.0 and you do not use special features of 
`%>%`, try the native `|>` operator first (see `?pipeOp`). 1) You do not 
depend an a user-contributed package, and 2) it works at the parser level.

Cheers,
Denes

On 1/2/23 18:59, Michael Lachanski wrote:
> Dénes, thank you for the guidance - which is well-taken.
> 
> Your side note raises an interesting question: I find the piping %>% 
> operator readable. Is there any downside to it? Or is the side note 
> meant to tell me to drop the last: "%>% `[`"?
> 
> Thank you,
> 
> 
> ==
> Michael Lachanski
> PhD Student in Demography and Sociology
> MA Candidate in Statistics
> University of Pennsylvania
> mikelach using sas.upenn.edu <mailto:mikelach using sas.upenn.edu>
> 
> 
> On Sat, Dec 31, 2022 at 9:22 AM Dénes Tóth <toth.denes using kogentum.hu 
> <mailto:toth.denes using kogentum.hu>> wrote:
> 
>     Hi Michael,
> 
>     Note that you have to be very careful when using by-reference
>     operations
>     in data.table (see `?data.table::set`), especially in a functional
>     programming approach. In your function, you avoid this problem by
>     calling `data.table(A)` which makes a copy of A even if it is already a
>     data.table. However, for large data.table-s, copying can be a very
>     expensive operation (esp. in terms of RAM usage), which can be totally
>     eliminated by using data.tables in the data.table-way (e.g., joining,
>     grouping, and aggregating in the same step by performing these
>     operations within `[`, see `?data.table`).
> 
>     So instead of blindly functionalizing all your code, try to be
>     pragmatic. Functional programming is not about using pure functions in
>     *every* part of your code base, because it is unfeasible in 99.9% of
>     real-world problems. Even Haskell has `IO` and `do`; the point is that
>     the  imperative and functional parts of the code are clearly separated
>     and imperative components are (tried to be) as top-level as possible.
> 
>     So when using data.table, a good strategy is to use pure functions for
>     performing within-data.table operations, e.g., `DT[, lapply(.SD, mean),
>     .SDcols = is.numeric]`, and when these operations alter `DT` by
>     reference, invoke the chains of these operations in "pure" wrappers -
>     e.g., calling `A <- copy(A)` on the top and then modifying `A` directly.
> 
>     Cheers,
>     Denes
> 
>     Side note: You do not need to use `DT[ , A:= shift(A, fill = NA, type =
>     "lag", n = 1)] %>% `[`(return(DT))`. `[.data.table` returns the result
>     (the modified DT) invisibly. If you want to let auto-print work, you
>     can
>     just use `DT[ , A:= shift(A, fill = NA, type = "lag", n = 1)][]`.
> 
>     Note that this also means you usually you do not need to use magrittr's
>     or base-R pipe when transforming data.table-s. You can do this instead:
>     ```
>     DT[
>         ## filter rows where 'x' column equals "a"
>         x == "a"
>     ][
>         ## calculate the mean of `z` for each gender and assign it to `y`
>         , y := mean(z), by = "gender"
>     ][
>         ## do whatever you want
>         ...
>     ]
>     ```
> 
> 
>     On 12/31/22 13:39, Rui Barradas wrote:
>      > Às 06:50 de 31/12/2022, Michael Lachanski escreveu:
>      >> Hello,
>      >>
>      >> I am trying to make a habit of "functionalizing" all of my code as
>      >> recommended by Hadley Wickham. I have found it surprisingly
>     difficult
>      >> to do
>      >> so because several intermediate features from data.table break
>     or give
>      >> unexpected results using purrr and its data.table adaptation,
>     tidytable.
>      >> Here is the a minimal working example of what has stumped me most
>      >> recently:
>      >>
>      >> ===
>      >>
>      >> library(data.table); library(tidytable)
>      >>
>      >> minimal_failing_function <- function(A){
>      >>    DT <- data.table(A)
>      >>    DT[ , A:= shift(A, fill = NA, type = "lag", n = 1)] %>% `[`
>      >>    return(DT)}
>      >> # works
>      >> minimal_failing_function(c(1,2))
>      >> # fails
>      >> tidytable::pmap_dfr(.l = list(c(1,2)),
>      >>                      .f = minimal_failing_function)
>      >>
>      >>
>      >> ===
>      >> These should ideally give the same output, but do not. This also
>     fails
>      >> using purrr::pmap_dfr rather than tidytable. I am using R 4.2.2
>     and I
>      >> am on
>      >> Mac OS Ventura 13.1.
>      >>
>      >> Thank you for any help you can provide or general guidance.
>      >>
>      >>
>      >> ==
>      >> Michael Lachanski
>      >> PhD Student in Demography and Sociology
>      >> MA Candidate in Statistics
>      >> University of Pennsylvania
>      >> mikelach using sas.upenn.edu <mailto:mikelach using sas.upenn.edu>
>      >>
>      >>     [[alternative HTML version deleted]]
>      >>
>      >> ______________________________________________
>      >> R-help using r-project.org <mailto:R-help using r-project.org> mailing list
>     -- To UNSUBSCRIBE and more, see
>      >>
>     https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!IBzWLUs!VdfzdJ15GLScUok_hiqL3DvTJ20Ce8JMBkQ1NosBfyOvu68iuQkh9nsPZuUBbB9BtrsZBh86OjGyyj3lAB2g_xXCvB6t$ <https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!IBzWLUs!VdfzdJ15GLScUok_hiqL3DvTJ20Ce8JMBkQ1NosBfyOvu68iuQkh9nsPZuUBbB9BtrsZBh86OjGyyj3lAB2g_xXCvB6t$>
>      >> PLEASE do read the posting guide
>      >>
>     https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!IBzWLUs!VdfzdJ15GLScUok_hiqL3DvTJ20Ce8JMBkQ1NosBfyOvu68iuQkh9nsPZuUBbB9BtrsZBh86OjGyyj3lAB2g_3rS2yQK$ <https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!IBzWLUs!VdfzdJ15GLScUok_hiqL3DvTJ20Ce8JMBkQ1NosBfyOvu68iuQkh9nsPZuUBbB9BtrsZBh86OjGyyj3lAB2g_3rS2yQK$>
>      >> and provide commented, minimal, self-contained, reproducible code.
>      > Hello,
>      >
>      > Use map_dfr instead of pmap_dfr.
>      >
>      >
>      > library(data.table)
>      > library(tidytable)
>      >
>      > minimal_failing_function <- function(A) {
>      >    DT <- data.table(A)
>      >    DT[ , A:= shift(A, fill = NA, type = "lag", n = 1)] %>% `[`
>      >    return(DT)
>      > }
>      >
>      > # works
>      > tidytable::map_dfr(.x = list(c(1,2)),
>      >                     .f = minimal_failing_function)
>      > #> # A tidytable: 2 × 1
>      > #>       A
>      > #>   <dbl>
>      > #> 1    NA
>      > #> 2     1
>      >
>      >
>      > Hope this helps,
>      >
>      > Rui Barradas
>      >
>      > ______________________________________________
>      > R-help using r-project.org <mailto:R-help using r-project.org> mailing list
>     -- To UNSUBSCRIBE and more, see
>      >
>     https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!IBzWLUs!VdfzdJ15GLScUok_hiqL3DvTJ20Ce8JMBkQ1NosBfyOvu68iuQkh9nsPZuUBbB9BtrsZBh86OjGyyj3lAB2g_xXCvB6t$ <https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!IBzWLUs!VdfzdJ15GLScUok_hiqL3DvTJ20Ce8JMBkQ1NosBfyOvu68iuQkh9nsPZuUBbB9BtrsZBh86OjGyyj3lAB2g_xXCvB6t$>
>      > PLEASE do read the posting guide
>      >
>     https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!IBzWLUs!VdfzdJ15GLScUok_hiqL3DvTJ20Ce8JMBkQ1NosBfyOvu68iuQkh9nsPZuUBbB9BtrsZBh86OjGyyj3lAB2g_3rS2yQK$ <https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!IBzWLUs!VdfzdJ15GLScUok_hiqL3DvTJ20Ce8JMBkQ1NosBfyOvu68iuQkh9nsPZuUBbB9BtrsZBh86OjGyyj3lAB2g_3rS2yQK$>
>      > and provide commented, minimal, self-contained, reproducible code.
>      >
>



More information about the R-help mailing list