[R] How to convert European short dates to ISO format?

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Thu Jun 11 09:17:28 CEST 2020


>>>>> Rich Shepard 
>>>>>     on Wed, 10 Jun 2020 07:44:49 -0700 writes:

    > On Wed, 10 Jun 2020, Jeff Newmiller wrote:
    >> Fix your format specification?  ?strptime

    >>> I have been trying to convert European short dates
    >>> formatted as dd/mm/yy into the ISO 8601 but the function
    >>> as.Dates interprets them as American ones (mm/dd/yy),
    >>> thus I get:

    > Look at Hadley Wickham's 'tidyverse' collection as
    > described in R for Data Science. There are date, datetime,
    > and time functions that will do just what you want.

    > Rich

I strongly disagree that automatic guessing of date format is a
good idea:

If you have dates such as  01/02/03, 10/11/12 , ...
you cannot have a software (and also not a human) to *guess* for
you what it means.  You have to *know* or get that knowledge "exogenously",
i.e., from context (say "meta data" if you want) that you as
data analyst must have before you can reliably work with that
data.

There is a global standard (ISO) for dates,  2020-06-11, for today's;
These have the huge advantage that alphabetical ordering is
equivalent to time ordering ... and honestly I don't see why
smart people (such as most? R users) do not all use these much
more often, notably when it comes to data.

But as long as most people in the world don't use that format
and practically all default formats for dates (e.g. in
spreadsheats and computer locales) do not use the ISO
standard, but rather regional conventions, one must add meta
data to have 100% garantee to use the correct format.

Of course, you can often guess correctly with very high
(subjective) probability, e.g.,   11/23/99  is highly probably
the 23rd of Nov, 1999.... and indeed if you have more than a few
dates, it often helps to guess correctly.  But there's no
guarantee.

No, I state that it is much better to ask from the data analyst
to use their brains a little bit and enter the date format
explicitly, than using software that does guess it for them
correctly most of the time.  How should they find out at all in
the rare cases the automatic guess will be wrong ?

Martin Maechler
ETH Zurich  and  R Core team



More information about the R-help mailing list