[R] tidyverse: grouped summaries (with summerize)

Rich Shepard r@hep@rd @end|ng |rom @pp|-eco@y@@com
Mon Sep 13 22:52:42 CEST 2021


I changed the data files so the date-times are in five separate columns:
year, month, day, hour, and minute; for example,
year,month,day,hour,min,cfs
2016,03,03,12,00,149000
2016,03,03,12,10,150000
2016,03,03,12,20,151000
2016,03,03,12,30,156000
2016,03,03,12,40,154000
2016,03,03,12,50,150000
2016,03,03,13,00,153000
2016,03,03,13,10,156000
2016,03,03,13,20,154000

The script is based on the example (on page 59 of 'R for Data Science'):
library('tidyverse')
disc <- read.csv('../data/water/disc.dat', header = TRUE, sep = ',', stringsAsFactors = FALSE)
disc$year <- as.integer(disc$year)
disc$month <- as.integer(disc$month)
disc$day <- as.integer(disc$day)
disc$hour <- as.integer(disc$hour)
disc$min <- as.integer(disc$min)
disc$cfs <- as.double(disc$cfs, length = 6)

# use dplyr to filter() by year, month, day; summarize() to get monthly
# means, sds
disc_by_month <- group_by(disc, year, month)
summarize(disc_by_month, vol = mean(cfs, na.rm = TRUE))

but my syntax is off because the results are:
> source('disc.R')
`summarise()` has grouped output by 'year'. You can override using the `.groups` argument.
Warning messages:
1: In eval(ei, envir) : NAs introduced by coercion
2: In eval(ei, envir) : NAs introduced by coercion
> ls()
[1] "disc"          "disc_by_month"
> disc_by_month
# A tibble: 590,940 × 6
# Groups:   year, month [66]
     year month   day  hour   min    cfs
    <int> <int> <int> <int> <int>  <dbl>
  1  2016     3     3    12     0 149000
  2  2016     3     3    12    10 150000
  3  2016     3     3    12    20 151000
  4  2016     3     3    12    30 156000
  5  2016     3     3    12    40 154000
  6  2016     3     3    12    50 150000
  7  2016     3     3    13     0 153000
  8  2016     3     3    13    10 156000
  9  2016     3     3    13    20 154000
10  2016     3     3    13    30 155000
# … with 590,930 more rows

I have the same results if I use as.numeric rather than as.integer and
as.double. What am I doing incorrectly?

TIA,

Rich



More information about the R-help mailing list