[Rd] 1954 from NA

brodie gaslam brod|e@g@@|@m @end|ng |rom y@hoo@com
Sun May 23 17:19:04 CEST 2021


> On Sunday, May 23, 2021, 10:45:22 AM EDT, Adrian Dușa <dusa.adrian using gmail.com> wrote:
>
> On Sun, May 23, 2021 at 4:33 PM brodie gaslam via R-devel <r-devel using r-project.org> wrote:
> > I should add, I don't know that you can rely on this
> > particular encoding of R's NA.  If I were trying to restore
> > an NA from some external format, I would just generate an
> > R NA via e.g NA_real_ in the R session I'm restoring the
> > external data into, and not try to hand assemble one.
>
> Thanks for your answer, Brodie, especially on Sunday (much appreciated).

Maybe I shouldn't answer on Sunday given I've said several wrong things...

> The aim is not to reconstruct an NA, but to "tag" an NA (and yes, I was
> referring to an NA_real_ of course), as seen in action here:
> https://github.com/tidyverse/haven/blob/master/src/tagged_na.c
>
> That code:
> - preserves the first part 0x7ff0
> - preserves the last part 1954
> - adds one additional byte to store (tag) a character provided in the SEXP vector
>
> That is precisely my understanding, that doubles starting with 0x7ff are
> all NaNs. My question was related to the additional part 1954 from the
> low bits: why does it need 32 bits?

It probably doesn't need 32 bits.  The code is trying to set all 64 bits.
It seems natural to do the high 32 bit, and then the low.  But I'm not R
Core so don't listen to me too closely.

> The binary value of 1954 is 11110100010, which is represented by 11 bits
> occupying at most 2 bytes... So why does it need 4 bytes?
>
> Re. the possible overflow, I am not sure: 0x7ff0 is the decimal 32752,
> or the binary 111111111110000.

You are right, I had a moment and wrongly counted hex digits as bytes
instead of half-bytes.

> That is just about enough to fit in the available 16 bits (actually 15
> to leave one for the sign bit), so I don't really understand why it
> would. And in > any case, the union definition uses an unsigned short
> which (if my understanding is correct) should certainly not overflow:
>
> typedef union
> {
>     double value;
>     unsigned short word[4];
> } ieee_double;
>
> What is gained with this proposal: 16 additional bits to do something
> with. For the moment, only 16 are available (from the lower part of the
> high 32 bits). If the value 1954 would be checked as a short instead of
> an int, the other 16 bits would become available. And those bits could
> be extremely valuable to tag multi-byte characters, for instance, but
> also higher numbers than 32767.

Note that the stability of the payload portion of NaNs is questionable:

https://developer.r-project.org/Blog/public/2020/11/02/will-r-work-on-apple-silicon/#nanan-payload-propagation

Also, if I understand correctly, you would be asking R core to formalize
the internal representation of the R NA, which I don't think is public?
So that you can use those internal bits for your own purposes with a
guarantee that R will not disturb them?  Obviously only they can answer
that.

Apologies for confusing the issue.

B,

PS: the other obviously wrong thing I said was the NA was 0x7ff0 0000 &
1954 when it is really 0x7ff0 0000 0000 0000 & 1954 when.



More information about the R-devel mailing list