[R] Need fresh eyes to see what I'm missing

Avi Gross @v|gro@@ @end|ng |rom ver|zon@net
Tue Sep 14 18:16:45 CEST 2021


Rich,

I reproduced your problem on my re-arranging the code the mailer mangled. I tried variations like not using pipes or changing what it is grouped by and they all show your results on the abbreviated data with the error:

`summarise()` has grouped output by 'year'. You can override using the `.groups` argument.

I think I fixed summarise()  but it makes me wonder if there is an inconsistency introduced along the way as what you used is supposed to work and has worked for me in the past.

I note the man page for summarise() mentions that the .groups="..." is experimental and a tad confusing:

I changed your code to this by telling it to keep the grouping in the output the same:

vel_by_month = vel %>%
  group_by(year, month) %>%
  summarise(flow = mean(fps, na.rm = TRUE), .groups="keep")

The change from your code is the addition at the very end of the .groups="keep" argument.

Since I used your limited data, this is all I get:

> vel_by_month
# A tibble: 1 x 3
# Groups:   year, month [1]
year month  flow
<int> <int> <dbl>
  1  2016     3  1.77

For now, all I did was shut summarise() up.

Not having the rest of your data, the question is where your NA and Nan are introduced. If the change I made above does not resolve it, then as others suggested, you begin by looking at your data more carefully perhaps starting with the .CSV file and then the data structures in R, along the lines of what you were shown. I find the table() function useful for categorical data with limited choices as it would spit out the anomaly as happening once.

I see your point about needing fresh eyes. My eyes do not see what you did wrong but am just following clues you may be ignoring.


-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Rich Shepard
Sent: Tuesday, September 14, 2021 11:21 AM
To: r-help using r-project.org
Subject: [R] Need fresh eyes to see what I'm missing

The data file begins this way:
year,month,day,hour,min,fps
2016,03,03,12,00,1.74
2016,03,03,12,10,1.75
2016,03,03,12,20,1.76
2016,03,03,12,30,1.81
2016,03,03,12,40,1.79
2016,03,03,12,50,1.75
2016,03,03,13,00,1.78
2016,03,03,13,10,1.81

The script to process it:
library('tidyverse')
vel <- read.csv('../data/water/vel.dat', header = TRUE, sep = ',', stringsAsFactors = FALSE) vel$year <- as.integer(vel$year) vel$month <- as.integer(vel$month) vel$day <- as.integer(vel$day) vel$hour <- as.integer(vel$hour) vel$min <- as.integer(vel$min) vel$fps <- as.double(vel$fps, length = 6)

# use dplyr to filter() by year, month, day; summarize() to get monthly # means vel_by_month = vel %>%
     group_by(year, month) %>%
     summarize(flow = mean(fps, na.rm = TRUE))

R's display after running the script:
> source('vel.R')
`summarise()` has grouped output by 'year'. You can override using the `.groups` argument.
Warning messages:
1: In eval(ei, envir) : NAs introduced by coercion
2: In eval(ei, envir) : NAs introduced by coercion
3: In eval(ei, envir) : NAs introduced by coercion

The dataframe created by the read.csv() command:
> head(vel)
   year month day hour min  fps
1 2016     3   3   12   0 1.74
2 2016     3   3   12  10 1.75
3 2016     3   3   12  20 1.76
4 2016     3   3   12  30 1.81
5 2016     3   3   12  40 1.79
6 2016     3   3   12  50 1.75

and the resulting grouping:
> vel_by_month
# A tibble: 67 × 3
# Groups:   year [8]
     year month   flow
    <int> <int>  <dbl>
  1     0    NA NaN
  2  2016     3   2.40
  3  2016     4   3.00
  4  2016     5   2.86
  5  2016     6   2.51
  6  2016     7   2.18
  7  2016     8   1.89
  8  2016     9   1.38
  9  2016    10   1.73
10  2016    11   2.01
# … with 57 more rows

I cannot find why line 1 is there. Other data sets don't produce this result.

TIA,

Rich

______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list