[R] tidyverse: grouped summaries (with summArize)

Avi Gross @v|gro@@ @end|ng |rom ver|zon@net
Tue Sep 14 00:43:03 CEST 2021


I think we wandered away into a package rather than base R, but the request seems easy enough.

Just FYI, Rich, as you seem not to have incorporated the advice we gave yet about the first argument, your use of group_by() is a tad odd.

disc %>%
     group_by(hour) %>%
     group_by(day) %>%
     group_by(year, month) %>%
     summarize(disc_by_month, vol = mean(cfs, na.rm = TRUE))

Not sure why you use disc once and disc_by_month the second superfluous time but if you read the manual page for group_by() https://dplyr.tidyverse.org/reference/group_by.html you may note it tends to be called ONCE with multiple arguments in sequence that specify what columns in the data.frame to group by sequentially.

disc %>%
     group_by(hour, day, year, month) %>%
     summarize(vol = mean(cfs, na.rm = TRUE))

Not sure most people would group that way as the above sorts by hours first. Many might reverse that sequence.

-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Rich Shepard
Sent: Monday, September 13, 2021 6:32 PM
To: R mailing list <r-help using r-project.org>
Subject: Re: [R] tidyverse: grouped summaries (with summerize)

On Tue, 14 Sep 2021, Eric Berger wrote:

> This code is not correct:
> disc_by_month %>%
>     group_by(year, month) %>%
>     summarize(disc_by_month, vol = mean(cfs, na.rm = TRUE)) It should 
> be:
> disc %>% group_by(year,month) %>% summarize(vol=mean(cfs,na.rm=TRUE)

Eric/Avi:

That makes no difference:
> disc_by_month
# A tibble: 590,940 × 6
# Groups:   year, month [66]
     year month   day  hour   min    cfs
    <int> <int> <int> <int> <int>  <dbl>
  1  2016     3     3    12     0 149000
  2  2016     3     3    12    10 150000
  3  2016     3     3    12    20 151000
  4  2016     3     3    12    30 156000
  5  2016     3     3    12    40 154000
  6  2016     3     3    12    50 150000
  7  2016     3     3    13     0 153000
  8  2016     3     3    13    10 156000
  9  2016     3     3    13    20 154000
10  2016     3     3    13    30 155000
# … with 590,930 more rows

I wondered if I need to group first by hour, then day, then year-month.
This, too, produces the same output:

disc %>%
     group_by(hour) %>%
     group_by(day) %>%
     group_by(year, month) %>%
     summarize(disc_by_month, vol = mean(cfs, na.rm = TRUE))

And disc shows the read dataframe.

I don't understand why the columns are not grouping.

Thanks,

Rich

______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list