[Rd] as.Date(Inf) displays as 'NA' but is actually 'Inf'

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Wed Mar 6 12:58:00 CET 2019


>>>>> Martin Maechler 
>>>>>     on Wed, 6 Mar 2019 11:51:33 +0100 writes:

>>>>> Gabriel Becker 
>>>>>     on Tue, 5 Mar 2019 22:01:37 -0800 writes:

    >> On Tue, Mar 5, 2019 at 9:54 PM Richard White <w using rwhite.no> wrote:
    >>> Hi Gabriel,
    >>> 
    >>> The point is that it *visually* displays as NA, but is.na() still
    >>> responds as FALSE.
    >>> 
    >>> When I (and I am sure many people) see an NA, we then use is.na(). If we
    >>> see Inf displayed, we then use is.infinite(). With as.Date() this breaks
    >>> down.
    >>> 
    >>> I'm not arguing that as.Date(Inf) should be coerced to NA. I'm arguing
    >>> that as.Date(Inf) should be *visually* displayed as Inf (i.e. the truth!).
    >>> I doubt this would break any existing code, because as.Date(Inf) acts as
    >>> Inf in every way possible, except for when you visually look at the output
    >>> printed on the screen.
    >>> 
    >>> William - For all the other Date bugs, they don't visually display false
    >>> information about the variable's contents. They might give wrong output,
    >>> but the output displayed is what exists inside the variable.
    >>> 
    >>> If we can't trust the R console to display the truth, then we are in a lot
    >>> of trouble.
    >>> 

    >> Well, I think it (subtly) actually is the truth though. What is displayed
    >> when you print a date is the *formatted date string, not the numeric value
    >> stored within the date*. The formatted date string of the infinite date, is
    >> actually, correctly,  NA, because, for the reasons I pointed out in my last
    >> post, it is indeterminate.

    >>> x = as.Date(Inf, origin = "2018-01-01")

    >>> format(x)

    >> [1] NA


    >> So that is what is happening, both technically, but also conceptually. For
    >> the record, I'd be surprised by that too, but I think its a situation of
    >> pieces working correctly individually, but together having a correct but
    >> unintuitive behavior.

    >> Others may feel differently though, thats just my read on it.

    >> Best,
    >> ~G

    > Thank you Richard and Gabe and Bill (Dunlap),
    > I agree with both of you that the behavior is suprising (to > 99.9% of useRs).

    > Gabe very nicely explains how it happens and also why it does
    > make some sense *and* that a change may be problematic.

    > However, the "principle of least surprise" I've learned very long ago
    > from Doug Bates is good "guiding" principle for software design
    > (if you allow to weight it with other principles, etc).

    > Here is a bit of slightly more principled code to show the
    > phenomenon, including the fact noticed by Bill that both
    > as.Date() and format.Date() should probably be tweaked such as
    > to signal warnings (e.g. on integer overflow for too large numbers).

    > ## -------------------------------------------------------------------------
    > xDates <- lapply(c(-Inf, Inf, NA, NaN,
    > 1e9, 4e9, 1e100, .Machine$double.xmax),
    > as.Date, origin = "2000-01-01")
    > str(xDates) # --> first 4 *all* show as  NA
    > sapply(xDates, is.na) # the two +-Inf are not NA
    > (f.D <- sapply(xDates, format))# 1..4: NA, then "negative" but all the same (?!)
    > stopifnot(is.na(f.D)[1:4]) # the formats (of 1..4) *are* all NA !!
    > ## show their true internals -- still contain what was put there :
    > for(d in xDates) dput(d)
    > ## -------------------------------------------------------------------------

    > produces

    >> xDates <- lapply(c(-Inf, Inf, NA, NaN,
    > +                    1e9, 4e9, 1e100, .Machine$double.xmax),
    > +                  as.Date, origin = "2000-01-01")
    >> str(xDates) # --> first 4 *all* show as  NA
    > List of 8
    > $ : Date[1:1], format: NA
    > $ : Date[1:1], format: NA
    > $ : Date[1:1], format: NA
    > $ : Date[1:1], format: NA
    > $ : Date[1:1], format: "2739907-01-04"
    > $ : Date[1:1], format: "-5877641-06-23"
    > $ : Date[1:1], format: "-5877641-06-23"
    > $ : Date[1:1], format: "-5877641-06-23"
    >> sapply(xDates, is.na) # the two +-Inf are not NA
    > [1] FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE
    >> (f.D <- sapply(xDates, format))# 1..4: NA, then "negative" but all the same (?!)
    > [1] NA               NA               NA               NA               "2739907-01-04"  "-5877641-06-23"
    > [7] "-5877641-06-23" "-5877641-06-23"
    >> stopifnot(is.na(f.D)[1:4]) # the formats (of 1..4) *are* all NA !!
    >> ## show their true internals -- still contain what was put there :
    >> for(d in xDates) dput(d)
    > structure(-Inf, class = "Date")
    > structure(Inf, class = "Date")
    > structure(NA_real_, class = "Date")
    > structure(NaN, class = "Date")
    > structure(1000010957, class = "Date")
    > structure(4000010957, class = "Date")
    > structure(1e+100, class = "Date")
    > structure(1.79769313486232e+308, class = "Date")
    >> 

    > ---------

    > What if we left NA ( NA_character_ specifically ) as result for format(),
    > but changed the print() method so it gives better information
    > here ?

    > I would argue that -Inf and Inf should show differently than
    > true NA's or NaN's .. not the least because infinitely past and
    > infinitely into the future are different concepts.

    > Martin Maechler
    > ETH Zurich (and R Core team)


One change that would solve these problems would be to allow
<POSIXlt> [["year"]]  to become "double" instead of "integer".
Then  as.POSIXlt()  would return different things, no integer
overflow and still contain the correct numbers which it
currently cannot (but should at least warn for integer overflow !).

Martin



More information about the R-devel mailing list