[Rd] Should 0L * NA_integer_ be 0L?

Sat May 23 12:49:21 CEST 2020

>>>>> Michael Chirico 
>>>>>     on Sat, 23 May 2020 18:08:22 +0800 writes:

    > I don't see this specific case documented anywhere (I also tried to search
    > the r-devel archives, as well as I could); the only close reference
    > mentions NA & FALSE = FALSE, NA | TRUE = TRUE. And there's also this
    > snippet from R-lang:

    > In cases where the result of the operation would be the same for all
    >> possible values the NA could take, the operation may return this value.
    >> 

    > This begs the question -- shouldn't 0L * NA_integer_ be 0L?

    > Because this is an integer operation, and according to this definition of
    > NA:

    > Missing values in the statistical sense, that is, variables whose value
    >> is not known, have the value @code{NA}
    >> 

    > NA_integer_ should be an unknown integer value between -2^31+1 and 2^31-1.
    > Multiplying any of these values by 0 results in 0 -- that is, the result of
    > the operation would be 0 for all possible values the NA could take.

    > This came up from what seems like an inconsistency to me:

    > all(NA, FALSE)
    > # [1] FALSE
    > NA * FALSE
    > # [1] NA

    > I agree with all(NA, FALSE) being FALSE because we know for sure that all
    > cannot be true. The same can be said of the multiplication -- whether NA
    > represents TRUE or FALSE, the resulting value is 0 (FALSE).

    > I also agree with the numeric case, FWIW: NA_real_ * 0 has to be NA_real_,
    > because NA_real_ could be Inf or NaN, for both of which multiplication by 0
    > gives NaN, hence 0 * NA_real_ is either 0 or NaN, hence it must be NA_real_.

I agree about almost everything you say above. ...
but possibly the main conclusion.

The problem with your proposed change would be that  integer
arithmetic gives a different result than the corresponding
"numeric" computation.
(I don't remember other such cases in R, at least as long as the
 integer arithmetic does not overflow.)

One principle to decided such problems in S and R has been that
the user should typically *not* have to know if their data is
stored in float/double or in integer, and the results should be the same
(possibly apart from staying integer for some operations).

{{Note that there are also situations were it's really
  undesirable that    0 * NA   does *not* give 0 (but NA);
  notably in sparse matrix operations where you'd very often can
  now that NA was not Inf (or NaN) and you really would like to
  preserve sparseness ...}}

    > [[alternative HTML version deleted]]

    (as you did not use plain text ..)