[Rd] 1954 from NA

Adrian Dușa du@@@@dr|@n @end|ng |rom gm@||@com
Sun May 23 16:45:10 CEST 2021


On Sun, May 23, 2021 at 4:33 PM brodie gaslam via R-devel <
r-devel using r-project.org> wrote:

> I should add, I don't know that you can rely on this
> particular encoding of R's NA.  If I were trying to restore
> an NA from some external format, I would just generate an
> R NA via e.g NA_real_ in the R session I'm restoring the
> external data into, and not try to hand assemble one.
>

Thanks for your answer, Brodie, especially on Sunday (much appreciated).
The aim is not to reconstruct an NA, but to "tag" an NA (and yes, I was
referring to an NA_real_ of course), as seen in action here:
https://github.com/tidyverse/haven/blob/master/src/tagged_na.c

That code:
- preserves the first part 0x7ff0
- preserves the last part 1954
- adds one additional byte to store (tag) a character provided in the SEXP
vector

That is precisely my understanding, that doubles starting with 0x7ff are
all NaNs. My question was related to the additional part 1954 from the low
bits: why does it need 32 bits?

The binary value of 1954 is 11110100010, which is represented by 11 bits
occupying at most 2 bytes... So why does it need 4 bytes?

Re. the possible overflow, I am not sure: 0x7ff0 is the decimal 32752, or
the binary 111111111110000.
That is just about enough to fit in the available 16 bits (actually 15 to
leave one for the sign bit), so I don't really understand why it would. And
in any case, the union definition uses an unsigned short which (if my
understanding is correct) should certainly not overflow:

typedef union
{
    double value;
    unsigned short word[4];
} ieee_double;

What is gained with this proposal: 16 additional bits to do something with.
For the moment, only 16 are available (from the lower part of the high 32
bits). If the value 1954 would be checked as a short instead of an int, the
other 16 bits would become available. And those bits could be extremely
valuable to tag multi-byte characters, for instance, but also higher
numbers than 32767.

Best wishes,
Adrian

	[[alternative HTML version deleted]]



More information about the R-devel mailing list