[R] How to convert European short dates to ISO format?

Richard O'Keefe r@oknz @end|ng |rom gm@||@com
Thu Jun 11 11:31:58 CEST 2020


I would add to this that in an important data set I was working with,
most of the dates were dd/mm/yy but some of them were mm/dd/yy and
that led to the realisation that I couldn't *tell* for about 40% of
the dates which they were.  If they were all one or the other, no
worries, but when you have people from mixed backgrounds writing in
mixed formats, you have a problem.

On Thu, 11 Jun 2020 at 19:17, Martin Maechler <maechler using stat.math.ethz.ch>
wrote:

> >>>>> Rich Shepard
> >>>>>     on Wed, 10 Jun 2020 07:44:49 -0700 writes:
>
>     > On Wed, 10 Jun 2020, Jeff Newmiller wrote:
>     >> Fix your format specification?  ?strptime
>
>     >>> I have been trying to convert European short dates
>     >>> formatted as dd/mm/yy into the ISO 8601 but the function
>     >>> as.Dates interprets them as American ones (mm/dd/yy),
>     >>> thus I get:
>
>     > Look at Hadley Wickham's 'tidyverse' collection as
>     > described in R for Data Science. There are date, datetime,
>     > and time functions that will do just what you want.
>
>     > Rich
>
> I strongly disagree that automatic guessing of date format is a
> good idea:
>
> If you have dates such as  01/02/03, 10/11/12 , ...
> you cannot have a software (and also not a human) to *guess* for
> you what it means.  You have to *know* or get that knowledge "exogenously",
> i.e., from context (say "meta data" if you want) that you as
> data analyst must have before you can reliably work with that
> data.
>
> There is a global standard (ISO) for dates,  2020-06-11, for today's;
> These have the huge advantage that alphabetical ordering is
> equivalent to time ordering ... and honestly I don't see why
> smart people (such as most? R users) do not all use these much
> more often, notably when it comes to data.
>
> But as long as most people in the world don't use that format
> and practically all default formats for dates (e.g. in
> spreadsheats and computer locales) do not use the ISO
> standard, but rather regional conventions, one must add meta
> data to have 100% garantee to use the correct format.
>
> Of course, you can often guess correctly with very high
> (subjective) probability, e.g.,   11/23/99  is highly probably
> the 23rd of Nov, 1999.... and indeed if you have more than a few
> dates, it often helps to guess correctly.  But there's no
> guarantee.
>
> No, I state that it is much better to ask from the data analyst
> to use their brains a little bit and enter the date format
> explicitly, than using software that does guess it for them
> correctly most of the time.  How should they find out at all in
> the rare cases the automatic guess will be wrong ?
>
> Martin Maechler
> ETH Zurich  and  R Core team
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list