[Rd] Should 0L * NA_integer_ be 0L?

Michael Chirico m|ch@e|ch|r|co4 @end|ng |rom gm@||@com
Sat May 23 14:00:31 CEST 2020


OK, so maybe one way to paraphrase:

For R, the boundedness of integer vectors is an implementation detail,
rather than a deeper mathematical fact that can be exploited for this
case.

One might also expect then that overflow wouldn't result in NA, but
rather automatically cast up to numeric? But that this doesn't happen
for efficiency reasons?

Would it make any sense to have a different carveout for the logical
case? For logical, storage as integer might be seen as a similar type
of implementation detail (though if we're being this strict, the
question arises of what multiplication of logical values even means).

FALSE * NA = 0L


On Sat, May 23, 2020 at 6:49 PM Martin Maechler
<maechler using stat.math.ethz.ch> wrote:
>
> >>>>> Michael Chirico
> >>>>>     on Sat, 23 May 2020 18:08:22 +0800 writes:
>
>     > I don't see this specific case documented anywhere (I also tried to search
>     > the r-devel archives, as well as I could); the only close reference
>     > mentions NA & FALSE = FALSE, NA | TRUE = TRUE. And there's also this
>     > snippet from R-lang:
>
>     > In cases where the result of the operation would be the same for all
>     >> possible values the NA could take, the operation may return this value.
>     >>
>
>     > This begs the question -- shouldn't 0L * NA_integer_ be 0L?
>
>     > Because this is an integer operation, and according to this definition of
>     > NA:
>
>     > Missing values in the statistical sense, that is, variables whose value
>     >> is not known, have the value @code{NA}
>     >>
>
>     > NA_integer_ should be an unknown integer value between -2^31+1 and 2^31-1.
>     > Multiplying any of these values by 0 results in 0 -- that is, the result of
>     > the operation would be 0 for all possible values the NA could take.
>
>
>     > This came up from what seems like an inconsistency to me:
>
>     > all(NA, FALSE)
>     > # [1] FALSE
>     > NA * FALSE
>     > # [1] NA
>
>     > I agree with all(NA, FALSE) being FALSE because we know for sure that all
>     > cannot be true. The same can be said of the multiplication -- whether NA
>     > represents TRUE or FALSE, the resulting value is 0 (FALSE).
>
>     > I also agree with the numeric case, FWIW: NA_real_ * 0 has to be NA_real_,
>     > because NA_real_ could be Inf or NaN, for both of which multiplication by 0
>     > gives NaN, hence 0 * NA_real_ is either 0 or NaN, hence it must be NA_real_.
>
> I agree about almost everything you say above. ...
> but possibly the main conclusion.
>
> The problem with your proposed change would be that  integer
> arithmetic gives a different result than the corresponding
> "numeric" computation.
> (I don't remember other such cases in R, at least as long as the
>  integer arithmetic does not overflow.)
>
> One principle to decided such problems in S and R has been that
> the user should typically *not* have to know if their data is
> stored in float/double or in integer, and the results should be the same
> (possibly apart from staying integer for some operations).
>
>
> {{Note that there are also situations were it's really
>   undesirable that    0 * NA   does *not* give 0 (but NA);
>   notably in sparse matrix operations where you'd very often can
>   now that NA was not Inf (or NaN) and you really would like to
>   preserve sparseness ...}}
>
>
>     > [[alternative HTML version deleted]]
>
>     (as you did not use plain text ..)



More information about the R-devel mailing list