[Rd] as.character.POSIXt in R devel

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Mon Oct 3 14:46:08 CEST 2022


>>>>> Suharto Anggono Suharto Anggono via R-devel 
>>>>>     on Sun, 2 Oct 2022 08:42:50 +0000 (UTC) writes:

    > With r82904, 'as.character.POSIXt' in R devel is changed. The NEWS item:

    >   as.character(<POSIXt>) now behaves more in line with the
    >   methods for atomic vectors such as numbers, and is no longer
    >   influenced by options().

    > Part of the code:
    > 
    >   s <- trunc(x$sec)
    >   fs <- x$sec - s
    >   r1 <- sprintf("%d-%02d-%02d", 1900 + x$year, x$mon+1L, x$mday)
    >   if(any(n0 <- time != 0)) # add time if not 0
    >     r1[n0] <- paste(r1[n0],
    >                  sprintf("%02d:%02d:%02d%s", x$hour[n0], x$min[n0], s[n0],
    >                         substr(as.character(fs[n0]), 2L, 32L)))


    > * Wrong:

    > The result is wrong when as.character(fs[n0]) has scientific notation.

yes, you are right.  This is a lapsus I will fix.

    > Example (modified from https://bugs.r-project.org/show_bug.cgi?id=9819):
    > op <- options(scipen = 0, OutDec = ".") # (default setting)
    > x <- as.POSIXlt("2007-07-27 16:11:03.000002")
    > as.character(x)
    > # "2007-07-27 16:11:03.99999999983547e-06"
    > as.character(x$sec - trunc(x$sec))
    > # "1.99999999983547e-06"
    > options(op)

    > 'as.character.POSIXt' could temporarily set option 'scipen' large enough to prevent scientific notation in as.character(fs[n0]) .

Yes, something like that.


    > * Too much precision:

    > In some cases with fractional seconds with seconds close to 60, the result has many decimal places while there is an accurate representation with less decimal places. It is actually OK, just unpleasant.

I agree that is unpleasant.
To someone else I had written that we also may need to improve
the number of decimals shown here.
The design has been that it should be "full precision"
as it is for  as.character(<numbers>)

Now, we know that POSIXct cannot be very precise (in its
fractional seconds) but that is very different for POSIXlt where
fractional seconds may have 14 digits after the decimal point.

Ideally we could *store* with the POSIXlt object if it was
produced from a POSIXct one, and hence have only around 6 valid digits
(after the dec.) or not.  As we cannot currently store/save that
info, we kept using "full" precision which may be much more than
is sensible.

    > Example (modified from https://bugs.r-project.org/show_bug.cgi?id=14693):
    > op <- options(scipen = 0, OutDec = ".") # (default setting)
    > x <- as.POSIXlt("2011-10-01 12:34:56.3")
    > x$sec == 56.3 # TRUE

[which may be typical, but may also be platform dependent]

    > print(x$sec, 17)
    > # [1] 56.299999999999997
    > as.character(x)
    > # "2011-10-01 12:34:56.299999999999997"
    > format(x, "%Y-%m-%d %H:%M:%OS1") # short and accurate
    > # "2011-10-01 12:34:56.3"
    > ct <- as.POSIXct(x, tz = "UTC")
    > identical(ct,
    > as.POSIXct("2011-10-01 12:34:56.3", tz = "UTC"))
    > # TRUE
    > print(as.numeric(ct), 17)
    > # [1] 1317472496.3
    > lct <- as.POSIXlt(ct)
    > lct$sec == 56.3 # FALSE
    > print(lct$sec, 17)
    > # [1] 56.299999952316284
    > as.character(ct)
    > # "2011-10-01 12:34:56.299999952316284"
    > options(op)

    > The "POSIXct" case is a little different because some precision is already lost after converted to "POSIXct".

yes, indeed.

    > In 'as.character.POSIXt', using 'as.character' on the seconds (not separating the fractional part) might be good enough, but a leading zero must be added as necessary.

I think you are right: that may definitely better...

    > * Different from 'format':

    > - With fractional seconds, the result is influenced by option 'OutDec'.

Thank you.  I was not aware of that.
The reason "of course" being that  as.character(<numeric>)  is
*also* depending on option  OutDec.

I would say that is clearly wrong...  and I think we should
strongl consider to change that:

'OutDec' should influence print()ing and format()ing  but should
*not* influence  as.character()  at least not for basic R types/objects.


    > - From "Printing years" in ?strptime: "For years 0 to 999 most OSes pad with zeros or spaces to 4 characters, and Linux outputs just the number."
    > Because (1900 + x$year) is formatted with %d in 'as.character.POSIXt', years 0 to 999 is output without padding. It is different from 'format' in OSes other than Linux.

Good point.  This should be  amended.



    > * Behavior with "improper" "POSIXlt" object:

    > - "POSIXlt" object with out-of-bounds components is not normalized.

    > Example (modified from regr.tests-1d.R):
    > op <- options(scipen = 0) # (default setting)
    > x <- structure(
    > list(sec = 10000, min = 59L, hour = 18L,
    > mday = 6L, mon = 11L, year = 116L,
    > wday = 2L, yday = 340L,
    > isdst = 0L, zone = "CET", gmtoff = 3600L),
    > class = c("POSIXlt", "POSIXt"), tzone = "CET")
    > as.character(x)
    > # "2016-12-06 18:59:10000"
    > format(x)
    > # "2016-12-06 21:45:40"
    > options(op)


Yes, we knew that  and were not too happy about it, but also not
too unhappy:
After all,		    help(DateTimeClasses)
clearly explains how
POSIXlt objects should look like :

-------------------------------------------------------------------
  Class ‘"POSIXlt"’ is a named list of vectors representing

     ‘sec’ 0-61: seconds.
     ‘min’ 0-59: minutes.
     ‘hour’ 0-23: hours.
     ‘mday’ 1-31: day of the month
     ‘mon’ 0-11: months after the first of the year.
     ‘year’ years since 1900.
     ‘wday’ 0-6 day of the week, starting on Sunday.
     ‘yday’ 0-365: day of the year (365 only in leap years).

     ‘isdst’ Daylight Saving Time ... ... ...
     ................................
     ................................

-------------------------------------------------------------------

We have been aware that as.character() assumes the above specification,
even though other R functions, notably format() which uses
internal (C level; either system (OS) or R's own) strptime() do
arithmetic (modulo 60, then modulo 24, then modulo month length)
to compute the date "used".

Allowing such  "un-normalized" / out-of-bound  POSIXlt objects
in R has not been documented AFAICS, and has the consequence
that two different POSIXlt objects may correspond to the exact
same time. 

This may be something worth discussing.
In some sense we are discussing how the "POSIXlt" class is defined
(even though an S3 class is never formally defined).



    > - With "POSIXlt" object where sec, min, hour, mday, mon,
    > and year components are not all of the same length, recycling is not handled.

Good point.  I tend to agree that this should be improved *and* also
documented: AFAIK, it is also not at all documented  (or is it ??)
that the POSIXlt components should be thought to be recycling.

If we decide we want that, 
once this is documented (and all methods/functions tested with
such POSIXlt) it could also be used to use considerably smaller size
POSIXlt objects, e.g, when all parts are in the same year, or
when all seconds are 0, or ...

    > Example (modified from regr.tests-1d.R):
    > op <- options(scipen = 0) # (default setting)
    > x <- structure(
    > list(sec = c(1,  2), min = 59L, hour = 18L,
    > mday = 6L, mon = 11L, year = 116L,
    > wday = 2L, yday = 340L,
    > isdst = 0L, zone = "CET", gmtoff = 3600L),
    > class = c("POSIXlt", "POSIXt"), tzone = "CET")
    > as.character(x)
    > # c("2016-12-06 18:59:01", "NA NA:NA:02")
    > format(x)
    > # c("2016-12-06 18:59:01", "2016-12-06 18:59:02")
    > options(op)


Thank you for your careful analysis and feedback
on this future R behavior !

Best regards,
Martin




More information about the R-devel mailing list