[Rd] Date class shows Inf as NA; this confuses the use of is.na()

Martin Maechler m@echler @ending from @t@t@m@th@ethz@ch
Tue Jun 12 18:28:33 CEST 2018


>>>>> Emil Bode 
>>>>>     on Tue, 12 Jun 2018 12:00:42 +0000 writes:

> I agree that calling it invalid is a bit confusing, but I’m not sure what the wording should be, as the problem is that the conversion to POSIXlt is failing.
> The best solution would be to extend the whole POSIXlt-class, but that’s too much work.
> I’ve done some experiments, and it also seems that the Date class can store larger values than POSIXlt:
> > as.Date(8e9, origin='1970-01-01')==as.Date(9e9, origin='1970-01-01')
> [1] FALSE
> > as.POSIXlt(as.Date(8e9, origin='1970-01-01'))==as.POSIXlt(as.Date(9e9, origin='1970-01-01'))
> [1] TRUE
> > as.POSIXlt(as.Date(8e9, origin='1970-01-01'))
> [1] "-5877641-06-23 UTC"
> # Same for 9e9
> > as.Date(8e9, origin='1970-01-01')>Sys.Date()
> [1] TRUE
> > as.POSIXlt(as.Date(8e9, origin='1970-01-01'))>as.POSIXlt(Sys.Date())
> [1] FALSE
> 
> So the situation as I see it now:
> 
>   *   Having an infinite date may convey some information, so
>       we shouldn’t prohibit it anyway

>   *   Idem for very large values (positive or negative)

Indeed -- good you found that you don't have to go all the way to Inf
... and that is typical (and the reason why one has to solve the
problem anyway and way Inf is not really a special case in that
sense (but nicely in another sense) !

>   *   But we should warn users that their dates may not be neatly representable, that there is no way to use the default-print
>   *   So for values where the POSIXlt-print fails, I think it’s best to print the numerical value, along with some text warning the user

> So I’ve adapted the format-function a bit more, with behaviour below.
> The details can be adapted of course, but I feel it’s best to print some variant of as.numeric(x) if as.POSIXlt(x) turns out to be unreliable, and further leave is.na()

> 
> format.Date <- function (x, ...)
> {
>   xx <- format(as.POSIXlt(x), ...)
>   names(xx) <- names(x)
>   if(any(!is.na(x) & (-719162>as.numeric(x) | as.numeric(x)>2932896))) {
>     xx[!is.na(x) & (-719162>as.numeric(x) | as.numeric(x)>2932896)] <-
>       paste('Date with numerical value',as.numeric(x[!is.na(x) & (-719162>as.numeric(x) | as.numeric(x)>2932896)]))
>     warning('Some dates are not in the interval 01-01-01 and 9999-12-31, showing numerical value.')
>   }
>   xx
> }
> 
> With the following results:
> 
> > environment(print.Date) <- .GlobalEnv
> > as.Date(Inf, origin='1970-01-01')
> [1] "Date with numerical value Inf"
> Warning message:
> In format.Date(x) :
>   Some dates are not in the interval 01-01-01 and 9999-12-31, showing numerical value.
> 
This looks somewhat reasonable as a workaround for you and for now.

However, I'd propose another route to go for "the next version of R":
When I consider

  > str(unclass(as.POSIXlt.Date(Sys.time() + 1e50)))
  List of 9
   $ sec  : num 0
   $ min  : int 0
   $ hour : int 0
   $ mday : int 23
   $ mon  : int 5
   $ year : int -5879541
   $ wday : int 2
   $ yday : int 173
   $ isdst: int 0
   - attr(*, "tzone")= chr "UTC"
  > 

we see the integer overflow (to negative here) and that all
components but 'sec' (because allow fractions!) are integer.

I think we should allow 'year' to be "double" instead, and so it
could also be +Inf or -Inf and we'd nicely cover 
the conversions from and to 'numeric' -- which is really used
internally for dates and date-times in  POSIXct.

Martin

> 
> From: Gabe Becker <becker.gabe using gene.com>
> Date: Monday, 11 June 2018 at 23:59
> To: Emil Bode <emil.bode using dans.knaw.nl>
> Cc: Joris Meys <jorismeys using gmail.com>, Werner Grundlingh <wgrundlingh using gmail.com>, "macqueen1 using llnl.gov" <macqueen1 using llnl.gov>, r-devel <r-devel using r-project.org>
> Subject: Re: [Rd] Date class shows Inf as NA; this confuses the use of is.na()
> 
> format.Date <- function (x, ...)
> {
>   xx <- format(as.POSIXlt(x), ...)
>   names(xx) <- names(x)
>   xx[is.na<http://is.na>(xx) & !is.na<http://is.na>(x)] <- paste('Invalid date:',as.numeric(x[is.na<http://is.na>(xx) & !is.na<http://is.na>(x)]))
>   xx
> }



More information about the R-devel mailing list