[R] tidyverse: read_csv() misses column

Bill Dunlap w||||@mwdun|@p @end|ng |rom gm@||@com
Mon Nov 1 18:34:59 CET 2021


Use the col_type argument to specify your column types.  [Why would you
expect '2009' to be read as a string instead of a number?].  It looks like
an
initial zero causes an otherwise numeric looking entry to be considered
a string (handy for zip codes in the northeastern US).

help(read_csv) says the column type guessing is "not robust" and its
algorithm
doesn't seem to be documented in the help file:

col_types

One of NULL, a cols() specification, or a string. See vignette("readr") for
more details.

If NULL, all column types will be imputed from guess_max rows on the input
interspersed throughout the file. This is convenient (and fast), but not
robust. If the imputation fails, you'll need to increase the guess_max or
supply the correct types yourself.

...

-Bill
On Mon, Nov 1, 2021 at 10:16 AM Rich Shepard <rshepard using appl-ecosys.com>
wrote:
>
> On Mon, 1 Nov 2021, Kevin Thorpe wrote:
>
> > I do not have a specific answer to your particular problem. All I can
say
> > is when a CSV import doesn’t work, it can mean there is something in the
> > CSV file that is unexpected. When read_csv() fails, I will try
read.csv()
> > to compare the results.
>
> Kevin,
>
> Interesting that there's no error:
> cor_disc <- read.csv("../data/cor-disc.csv", header = TRUE)
> ...
> 12496 14171600 2010   3  15 16  45 PDT 1060
> 12497 14171600 2010   3  15 17   0 PDT 1060
> 12498 14171600 2010   3  15 17  15 PDT 1050
> 12499 14171600 2010   3  15 17  45 PDT 1050
>   [ reached 'max' / getOption("max.print") -- omitted 402856 rows ]
> > head(cor_disc)
>    site_nbr year mon day hr min  tz disc
> 1 14171600 2009  10  23  0   0 PDT 8750
> 2 14171600 2009  10  23  0  15 PDT 8750
> 3 14171600 2009  10  23  0  30 PDT 8750
> 4 14171600 2009  10  23  0  45 PDT 8750
> 5 14171600 2009  10  23  1   0 PDT 8750
> 6 14171600 2009  10  23  1  15 PDT 8750
> > str(cor_disc)
> 'data.frame':   415355 obs. of  8 variables:
>   $ site_nbr: chr  "14171600" "14171600" "14171600" "14171600" ...
>   $ year    : int  2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 ...
>   $ mon     : int  10 10 10 10 10 10 10 10 10 10 ...
>   $ day     : int  23 23 23 23 23 23 23 23 23 23 ...
>   $ hr      : int  0 0 0 0 1 1 1 1 2 2 ...
>   $ min     : int  0 15 30 45 0 15 30 45 0 15 ...
>   $ tz      : chr  "PDT" "PDT" "PDT" "PDT" ...
>   $ disc    : int  8750 8750 8750 8750 8750 8750 8750 8730 8730 8730 ...
>
> So, where might I look to see why tidyverse's read_csv() doesn't produce
the
> same results?
>
> Regards,
>
> Rich
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]



More information about the R-help mailing list