[R] tidyverse: grouped summaries (with summarize) [RESOLVED]

Avi Gross @v|gro@@ @end|ng |rom ver|zon@net
Tue Sep 14 01:44:14 CEST 2021


Just FYI, Rich, the way the idiom with pipeline works does allow but not require the method you used:

Yours was
  RESULT <-
    DATAFRAME %>%
    FN1(args) %>%
    ...
    FNn(args)
    
But equally valid are forms that assign the result at the end:

    DATAFRAME %>%
    FN1(args) %>%
    ...
    FNn(args) -> RESULT

Or that supply the first argument to just the first function:

    FN1(DATAFRAME, args) %>%
    ...
    FNn(args) -> RESULT

And if you read some tutorials, there are many other things you can do including variants on the pipe symbol to do other things but also how to put the variable returned into a different part (not the first position) of the argument that follows and lots more. Some people spend most of the programming time relatively purely in the tidyverse functions without looking much at base R.

I am not saying that is a good thing.


-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Rich Shepard
Sent: Monday, September 13, 2021 7:04 PM
To: r-help using r-project.org
Subject: Re: [R] tidyverse: grouped summaries (with summarize) [RESOLVED]

On Mon, 13 Sep 2021, Avi Gross via R-help wrote:

> As Eric has pointed out, perhaps Rich is not thinking pipelined. Summarize() takes a first argument as:
> 	summarise(.data=whatever, ...)
>
> But in a pipeline, you OMIT the first argument and let the pipeline supply an argument silently.

Avi,

Thank you. I read your message carefully and re-read the example on the bottom of page 60 and top of page 61. Then changed the command to:
disc_by_month = disc %>%
     group_by(year, month) %>%
     summarize(vol = mean(cfs, na.rm = TRUE))

And, the script now returns what I need:
> disc_by_month
# A tibble: 66 × 3
# Groups:   year [7]
     year month     vol
    <int> <int>   <dbl>
  1  2016     3 221840.
  2  2016     4 288589.
  3  2016     5 255164.
  4  2016     6 205371.
  5  2016     7 167252.
  6  2016     8 140465.
  7  2016     9  97779.
  8  2016    10 135482.
  9  2016    11 166808.
10  2016    12 165787.

I missed the beginning of the command where the resulting dataframe needs to be named first.

This clarifies my understanding and I appreciate your and Eric's help.

Regards,

Rich

______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list