[Rd] A potential POSIXlt->Date bug introduced in r-devel

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Fri Oct 7 14:52:01 CEST 2022


>>>>> Martin Maechler 
>>>>>     on Thu, 6 Oct 2022 10:15:29 +0200 writes:

>>>>> Davis Vaughan 
>>>>>     on Wed, 5 Oct 2022 17:04:11 -0400 writes:

    >> Hi all,

    >> I think I have discovered a bug in the conversion from POSIXlt to Date that
    >> has been introduced in r-devel.

    >> It affects lubridate, but surprisingly didn't cause test failures there.
    >> Instead it caused test failures in users of lubridate, like slider, arrow,
    >> and admiral (see https://github.com/tidyverse/lubridate/issues/1069), and
    >> at least in slider I have been asked by CRAN to correct this issue before
    >> 2022-10-16.

    >> In r-devel we get the following:

    >> ```
    >> data <- list(
    >> sec = 0,
    >> min = 0L,
    >> hour = 0L,
    >> mday = 31L,
    >> mon = c(0L, NA, 2L),
    >> year = 113L,
    >> wday = 4L,
    >> yday = 30L,
    >> isdst = 0L
    >> )

    >> x <- .POSIXlt(xx = data, tz = "UTC")
    >> x
    >> #> [1] "2013-01-31 UTC" NA               "2013-03-31 UTC"

    >> # Looks right
    >> as.POSIXct(x)
    >> #> [1] "2013-01-31 UTC" NA               "2013-03-31 UTC"



    >> # Weird, where is the `NA`?
    >> as.Date(x)
    >> #> [1] "2013-01-31" "1970-01-01" "2013-03-31"
    >> ```

    > I agree that the above is wrong, i.e., a bug in current  R-devel.

    >> The POSIXlt object is length 3, but is only partially filled out. 

    >> The other elements are all recycled to length 3 upon
    >> conversion to POSIXct or Date. 

    >> But when converting to Date, we lose the `NA` value. I think the
    >> `as.Date()` conversion seems inconsistent with the `as.POSIXct()`
    >> conversion.

    > Yes.  There was another very much relatd conversation here on R-devel,
    > initiated by Suharto Anggono just a few days ago.

    > This subject, i.e., "partially filled out" POSIXlt objects, was
    > one of the topics, too.

    > See my reply there, notably at the end:

    > https://stat.ethz.ch/pipermail/r-devel/2022-October/082072.html
    
    > I do mention that "recycling" of partially filled POSIXlt
    > objects has only partially been implemented in R more generally
    > and was actually asking for comments and further discussion.


    >> It looks like this comes up because the conversion to Date now defaults to
    >> using `sec` if any of the date-like fields are `NA_INTEGER`,

    > yes, because only that allows to also deal with +/- Inf  etc,
    > as was recently added as new feature, see the NEWS of R 4.2.0

    > • Not strictly fixing a bug, format()ing and print()ing of
    > non-finite Date and POSIXt values NaN and +/-Inf no longer show
    > as NA but the respective string, e.g., Inf, for consistency with
    > numeric vector's behaviour, fulfilling the wish of PR#18308.

    > i.e., see also R's bugzilla
    > https://bugs.r-project.org/show_bug.cgi?id=18308

    > which actually *also* mentioned an NA problem in Date/Time objects.


    >> but this means  the `NA` in the `mon` field is ignored.

    > which I agree is bogous and we'll fix.

    > Still, I did not get any feedback on asking about documentation
    > etc on  POSIXlt objects ... and I *had* mentioned I agreed that
    > the current partial implementation of  "partially filled" i.e. recycling of
    > POSIXlt components should probably be made part of the
    > "definition" of POSIXlt.

    > Have I overlooked an existing definition / contract about these?

    > Martin

I'm still waiting for comments.

Note that  "partially filled" POSIXlt do not work correctly in
any version of R.  I mentioned that even length(.) may easily
fail; but there is much more.

While I can relatively easily fix Davis' case above,
the following example behaves wrongly in current and previous
released versions of R and in R-devel:

dlt <- .POSIXlt(list(sec = c(-999, 10000 + c(1:10,-Inf, NA)) + pi,
                                        # "out of range", non-finite, fractions
                     min = 45L, hour = c(21L, 3L, NA, 4L),
                     mday = 6L, mon  = c(11L, NA, 3L),
                     year = 116L, wday = 2L, yday = 340L, isdst = 1L))


Of course that's constructed to be particularly unpleasant.
You can try some of the following "checks" in your version(s) of
R to see that some of the things are misbehaving with it in all (*)
versions of R.
--
*) so I claim boldly


dct <- as.POSIXct(dlt)
(n <- length(dct))
dD  <- as.Date(dlt)
dDc <- as.Date(dct)
dltN <- as.POSIXlt(dct) # "normalized POSIXlt" (with *lost* accuracy):
data.frame(unclass(dltN))
.POSIXltNormalize <- function(x) {
    stopifnot(is.numeric(s <- x$sec))
    x <- as.POSIXlt(as.POSIXct(x)) # and restore the precise seconds :
    ifin <- is.finite(s) & is.finite(x$sec) # (maybe recycling already)
    x$sec[ifin] <- s[ifin] %% 60
    x
}
dlt2 <- .POSIXltNormalize(dlt) # normalized POSIXlt - with accuracy kept
all.equal(dlt2$sec, dltN$sec, tolerance = 0) # .. small (2e-9) difference
stopifnot(all.equal(dlt2, dltN),
          identical(as.POSIXct(dlt2), as.POSIXct(dltN)))
## First show (in a way it also works for older R), then check :
oldR <- getRversion() < "4.2.2"
print(width = 101,
data.frame(dlt, dltN, asCT = dct, asDateCT = dDc,
           asDate = if(oldR) rep_len(dD,  n) else dD ,
           na = is.na(dlt),
           fin = if(oldR) rep_len(NA, n) else is.finite(dlt))
)




--
Martin Mächler
ETH Zurich   and  R Core team



More information about the R-devel mailing list