[Rd] Reading 64-bit integers

Tue Mar 29 23:40:52 CEST 2011

Jon,

On Mar 29, 2011, at 1:33 PM, Jon Clayden wrote:

> Dear Simon,
> 
> Thank you for the response.
> 
> On 29 March 2011 15:06, Simon Urbanek <simon.urbanek at r-project.org> wrote:
>> 
>> On Mar 29, 2011, at 8:46 AM, Jon Clayden wrote:
>> 
>>> Dear all,
>>> 
>>> I see from some previous threads that support for 64-bit integers in R
>>> may be an aim for future versions, but in the meantime I'm wondering
>>> whether it is possible to read in integers of greater than 32 bits at
>>> all. Judging from ?readBin, it should be possible to read 8-byte
>>> integers to some degree, but it is clearly limited in practice by R's
>>> internally 32-bit integer type:
>>> 
>>>> x <- as.raw(c(0,0,0,0,1,0,0,0))
>>>> (readBin(x,"integer",n=1,size=8,signed=F,endian="big"))
>>> [1] 16777216
>>>> x <- as.raw(c(0,0,0,1,0,0,0,0))
>>>> (readBin(x,"integer",n=1,size=8,signed=F,endian="big"))
>>> [1] 0
>>> 
>>> For values that fit into 32 bits it works fine, but for larger values
>>> it fails. (I'm a bit surprised by the zero - should the value not be
>>> NA if it is out of range?
>> 
>> No, it's not out of range - int is only 4 bytes so only 4 first bytes (respecting endianness order, hence LSB) are used.
> 
> The fact remains that I ask for the value of an 8-byte integer and
> don't get it.

I think you're misinterpreting the documentation:

     If ‘size’ is specified and not the natural size of the object,
     each element of the vector is coerced to an appropriate type
     before being written or as it is read.

The "integer" object type is defined as signed 32-bit in R, so if you ask for "8 bytes into object type integer", you get a coercion into that object type -- 32-bit signed integer -- as documented. I think the issue may come from the confusion of the object type "integer" with general "integer number" in mathematical sense that has no representation restrictions. (FWIW in C the "integer" type is "int" and it is 32-bit on all modern OSes regardless of platform - that's where the limitation comes from, it's not something R has made up).

> Pretending that it's really only four bytes because of
> the limits of R's integer type isn't all that helpful. Perhaps a
> warning should be put out if the cast will affect the value of the
> result? It looks like the relevant lines in src/main/connections.c are
> 3689-3697 in the current alpha:
> 
> #if SIZEOF_LONG == 8
> 		    case sizeof(long):
> 			INTEGER(ans)[i] = (int)*((long *)buf);
> 			break;
> #elif SIZEOF_LONG_LONG == 8
> 		    case sizeof(_lli_t):
> 			INTEGER(ans)[i] = (int)*((_lli_t *)buf);
> 			break;
> #endif
> 
>>> ) The value can be represented as a double,
>>> though:
>>> 
>>>> 4294967296
>>> [1] 4294967296
>>> 
>>> I wouldn't expect readBin() to return a double if an integer was
>>> requested, but is there any way to get the correct value out of it?
>> 
>> Trivially (for your unsigned big-endian case):
>> 
>> y <- readBin(x, "integer", n=length(x)/4L, endian="big")
>> y <- ifelse(y < 0, 2^32 + y, y)
>> i <- seq(1,length(y),2)
>> y <- y[i] * 2^32 + y[i + 1L]
> 
> Thanks for the code, but I'm not sure I would call that trivial,
> especially if one needs to cater for little endian and signed cases as
> well!

I was saying for your case and it's trivial as in read as integers, convert to double precision and add.

> This is what I meant by reconstructing the number manually...
> 

You didn't say so - you were talking about reconstructing it from a raw vector which seems a lot more painful since you can't compute with enough precision on raw vectors.

Cheers,
Simon