[R] Date read correctly from CSV, then reformatted incorrectly by R

Spencer Graves @pencer@gr@ve@ @end|ng |rom e||ect|vede|en@e@org
Mon Nov 22 21:33:27 CET 2021


	  I've written Ecfun::asNumericDF to overcome some of the common 
problems with read.data, read.csv, etc.:


https://www.rdocumentation.org/packages/Ecfun/versions/0.2-5/topics/asNumericDF


	  I use it routinely to help parse numbers, dates, etc., that are read 
as character. I'm sure it can be improved.  It's on GitHub in case 
anyone would like to take the time to suggest improvements:


https://github.com/sbgraves237/Ecfun


	  Hope this helps.
	  Spencer Graves


On 11/20/21 4:13 PM, Avi Gross via R-help wrote:
> This seems to be a topic that comes up periodically. The various ways in R
> and other packages for reading in data often come with methods that simply
> guess wrong or encounter one or more data items in a column that really do
> not fit so fields may just by default become a more common denominator of
> character or perhaps floating point.
> 
> There are ways that some such programs can be given a hint of what you
> expect or even be supplied with a way to coerce them into what you want
> while being read in. But realistically, often a more practical  method might
> be to take the data.frame variety you read in and before using it for other
> purposes, check it for validity and make any needed changes. Simplistic ones
> might be to see how many columns were read in to see if it matches
> expectations or generate an error. Or you may trim columns (or rows) that
> are not wanted.
> 
> In that vein, are there existing functions available that will accept what
> types you want one or more columns to be in and that validate if the current
> type is something else and then convert if needed? I mean we have functions
> like as.integer(df$x ) or more flexibly as(df$x, "integer") and you may
> simply build on a set of those and create others to suit any special needs.
> 
> Of course a good method carefully checks the results before over-writing as
> sometimes the result may not be the same length (as shown below) or may
> violate some other ideas or rules:
> 
>> as(c(NULL, NA, 3, 3.1, "3.1", list(1,2,"a")), "character")
> [1] "NA"  "3"   "3.1" "3.1" "1"   "2"   "a"
> 
> So if you have dates in some format, or sometimes an unknown format, there
> are ways, including some others have shown, to make them into some other
> date format or even make multiple columns that together embody the format.
> 
> What people sometimes do is assume software is perfect and should do
> anything they want. It is the other way around and the programmer or data
> creator has some responsibilities to use the right software on the right
> data and that may also mean sanity checks along the way to  see if the data
> is what you expect or alter it to be what you need.
> 
> 
> -----Original Message-----
> From: R-help <r-help-bounces using r-project.org> On Behalf Of Philip Monk
> Sent: Saturday, November 20, 2021 3:28 PM
> To: Jeff Newmiller <jdnewmil using dcn.davis.ca.us>
> Cc: R-help Mailing List <r-help using r-project.org>
> Subject: Re: [R] Date read correctly from CSV, then reformatted incorrectly
> by R
> 
> Thanks, Jeff.
> 
> I follow what you're doing below, but know I need to read up on Date /
> POSIXct.  Helpful direction!  :)
> 
> On Sat, 20 Nov 2021 at 18:41, Jeff Newmiller <jdnewmil using dcn.davis.ca.us>
> wrote:
>>
>> Beat me to it! But it is also worth noting that once converted to Date or
> POSIXct, timestamps should be treated as data without regard to how that
> data is displayed. When you choose to output that data you will have options
> as to the display format associated with the function you are using for
> output.
>>
>> My take:
>>
>> dta <- read.table( text=
>> "Buffer    28/10/2016    19/11/2016  31/12/2016    16/01/2017
> 05/03/2017
>> 100    2.437110889    -8.69674895    3.239299816    2.443183304
> 2.346743827
>> 200    2.524329899    -7.688862068    3.386811734    2.680347706
> 2.253885237
>> 300    2.100784256    -8.059855835    3.143786507    2.615152896
> 2.015645973
>> 400    1.985608385    -10.6707206    2.894572791    2.591925038
> 2.057913137
>> 500    1.824982163    -9.122519736    2.560350727    2.372226799
> 1.995863839
>> ", header=TRUE, check.names=FALSE, as.is=TRUE)
>>
>> dta
>>
>> library(dplyr)
>> library(tidyr)
>>
>> dt_fmt <- "%d/%m/%Y"
>>
>> dta_long <- (   dta
>>              %>% pivot_longer( cols = -Buffer
>>                              , names_to = "dt_chr"
>>                              , values_to = "LST"
>>                              )
>>              %>% mutate( dt_date = as.Date( dt_chr, format = dt_fmt )
>>                        , dt_POSIXct = as.POSIXct( dt_chr, format = dt_fmt,
> tz = "Etc/GMT+8" )
>>                        )
>>              )
>>
>> dta_long
>>
>> On November 20, 2021 10:01:56 AM PST, Andrew Simmons <akwsimmo using gmail.com>
> wrote:
>>> The as.Date function for a character class argument will try reading
>>> in two formats (%Y-%m-%d and %Y/%m/%d).
>>>
>>>
>>> This does not look like the format you have provided, which is why it
>>> doesn't work. Try something like:
>>>
>>>
>>> x <- c("28/10/2016", "19/11/2016", "31/12/2016", "16/01/2016",
>>> "05/03/2017") as.Date(x, format = "%d/%m/%Y")
>>>
>>>
>>> which produces this output:
>>>
>>>
>>>> x <- c("28/10/2016", "19/11/2016", "31/12/2016", "16/01/2016",
>>> "05/03/2017")
>>>> as.Date(x, format = "%d/%m/%Y")
>>> [1] "2016-10-28" "2016-11-19" "2016-12-31" "2016-01-16" "2017-03-05"
>>>>
>>>
>>>
>>> much better than before! I hope this helps
>>>
>>> On Sat, Nov 20, 2021 at 12:49 PM Philip Monk <prmonk using gmail.com> wrote:
>>>
>>>> Thanks Eric & Jeff.
>>>>
>>>> I'll certainly read up on lubridate, and the posting guide (again)
>>>> (this should be in plain text).
>>>>
>>>> CSV extract below...
>>>>
>>>> Philip
>>>>
>>>> Buffer    28/10/2016    19/11/2016    31/12/2016    16/01/2017
>>>> 05/03/2017
>>>> 100    2.437110889    -8.69674895    3.239299816    2.443183304
>>>> 2.346743827
>>>> 200    2.524329899    -7.688862068    3.386811734    2.680347706
>>>> 2.253885237
>>>> 300    2.100784256    -8.059855835    3.143786507    2.615152896
>>>> 2.015645973
>>>> 400    1.985608385    -10.6707206    2.894572791    2.591925038
>>>> 2.057913137
>>>> 500    1.824982163    -9.122519736    2.560350727    2.372226799
>>>> 1.995863839
>>>>
>>>>
>>>> On Sat, 20 Nov 2021 at 17:08, Philip Monk <prmonk using gmail.com> wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> Simple but infuriating problem.
>>>>>
>>>>> Reading in CSV of data using :
>>>>>
>>>>> ```
>>>>> # CSV file has column headers with date of scene capture in
>>>>> format
>>>> dd/mm/yyyy
>>>>> # check.names = FALSE averts R incorrectly processing dates due to
> '/'
>>>>> data <- read.csv("C:/R_data/Bungala (b2000) julian.csv",
>>>>> check.names =
>>>> FALSE)
>>>>>
>>>>> # Converts data table from wide (many columns) to long (many
>>>>> rows) and
>>>> creates the new object 'data_long'
>>>>> # Column 1 is the 'Buffer' number (100-2000), Columns 2-25
>>>>> contain
>>>> monthly data covering 2 years (the header row being the date, and
>>>> rows 2-21 being a value for each buffer).
>>>>> # Column headers for columns 2:25 are mutated into a column
>>>>> called
>>>> 'Date', values for each buffer and each date into the column 'LST'
>>>>> data_long <- data %>% pivot_longer(cols = 2:25, names_to =
>>>>> "Date",
>>>> values_to = "LST")
>>>>>
>>>>> # Instructs R to treat the 'Date' column data as a date
>>>>> data_long$Date <- as.Date(data_long$Date) ```
>>>>>
>>>>> Using str(data), I can see that R has correctly read the dates in
>>>>> the
>>>> format %d/%m/%y (e.g. 15/12/2015) though has the data type as chr.
>>>>>
>>>>> Once changing the type to 'Date', however, the date is reconfigured.
>>>> For instance, 15/01/2010 (15 January 2010), becomes 0015-01-20.
>>>>>
>>>>> I've tried ```data_long$Date <- as.Date(data_long$Date, format =
>>>> "%d/%m.%y")```, and also ```tryformat c("%d/%m%y")```, but either
>>>> the error persists or I get ```NA```.
>>>>>
>>>>> How do I make R change Date from 'chr' to 'date' without it going
> wrong?
>>>>>
>>>>> Suggestions/hints/solutions would be most welcome.  :)
>>>>>
>>>>> Thanks for your time,
>>>>>
>>>>> Philip
>>>>>
>>>>> Part-time PhD Student (Environmental Science) Lancaster
>>>>> University, UK.
>>>>>
>>>>> ~~~~~
>>>>>
>>>>> I asked a question a few weeks ago and put together the answer I
>>>>> needed
>>>> from the responses but didn't know how to say thanks on this list.
>>>> So, thanks Andrew Simmons, Bert Gunter, Jeff Newmiller and Daniel
> Nordlund!
>>>>
>>>> ______________________________________________
>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> --
>> Sent from my phone. Please excuse my brevity.
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list