[R] tidyverse: read_csv() misses column

Rich Shepard r@hep@rd @end|ng |rom @pp|-eco@y@@com
Mon Nov 1 18:54:58 CET 2021


On Mon, 1 Nov 2021, Bill Dunlap wrote:

> Use the col_type argument to specify your column types. [Why would you
> expect '2009' to be read as a string instead of a number?]. It looks like
> an initial zero causes an otherwise numeric looking entry to be considered
> a string (handy for zip codes in the northeastern US).

> help(read_csv) says the column type guessing is "not robust" and its
> algorithm doesn't seem to be documented in the help file:

Bill,

That makes sense. I read that in the book and forgot about it. I'll specify
the col_type for each column in the read_csv() function.

Specifying column names got me much closer:
> cor_disc <- read_csv("../data/cor-disc.csv", col_names = TRUE, col_types = c("c","c","c","c","c","c","i"))

> cor_disc
# A tibble: 415,263 × 8
    site_nbr  year mon   day   hr    min   tz     disc
    <chr>    <dbl> <chr> <chr> <chr> <chr> <chr> <dbl>
  1 14171600  2009 10    23    00    00    PDT    8750
  2 14171600  2009 10    23    00    15    PDT    8750
  3 14171600  2009 10    23    00    30    PDT    8750
  4 14171600  2009 10    23    00    45    PDT    8750
  5 14171600  2009 10    23    01    00    PDT    8750
  6 14171600  2009 10    23    01    15    PDT    8750
  7 14171600  2009 10    23    01    30    PDT    8750
  8 14171600  2009 10    23    01    45    PDT    8730
  9 14171600  2009 10    23    02    00    PDT    8730
10 14171600  2009 10    23    02    15    PDT    8730
# … with 415,253 more rows

The col_types for year was specified as "c", for disc as "i" but both are
input as doubles. That's a non-issue for disc (discharge in fps), but year
is a character as are months, days, etc.

Have I still missed something in specifying column types?

Regards,

Rich



More information about the R-help mailing list