[Rd] Reading 64-bit integers

Wed Apr 13 16:05:08 CEST 2011

Simon (et al.),

I was just wondering if anything further came of this... I would be
willing to help put together an updated patch, if the semantics can be
decided upon.

All the best,
Jon

On 30 March 2011 19:22, Simon Urbanek <simon.urbanek at r-project.org> wrote:
> Bill,
>
> thanks. I like that idea of the output parameter better, especially if we ever add different scalar vector types. Admittedly, what=integer() is the most useful case. What I was worried about is things like what=double(), output=integer() which could be legal, but are more conveniently dealt with via as.integer(readBin()) instead.
> I won't have more time today, but I'll have a look tomorrow.
>
> Thanks,
> Simon
>
>
> On Mar 30, 2011, at 1:38 PM, William Dunlap wrote:
>
>>
>>> -----Original Message-----
>>> From: r-devel-bounces at r-project.org
>>> [mailto:r-devel-bounces at r-project.org] On Behalf Of Simon Urbanek
>>> Sent: Tuesday, March 29, 2011 6:49 PM
>>> To: Duncan Murdoch
>>> Cc: r-devel at r-project.org
>>> Subject: Re: [Rd] Reading 64-bit integers
>>>
>>>
>>> On Mar 29, 2011, at 8:47 PM, Duncan Murdoch wrote:
>>>
>>>> On 29/03/2011 7:01 PM, Jon Clayden wrote:
>>>>> Dear Simon,
>>>>>
>>>>> On 29 March 2011 22:40, Simon
>>> Urbanek<simon.urbanek at r-project.org>  wrote:
>>>>>> Jon,
>>>>>>
>>>>>> On Mar 29, 2011, at 1:33 PM, Jon Clayden wrote:
>>>>>>
>>>>>>> Dear Simon,
>>>>>>>
>>>>>>> Thank you for the response.
>>>>>>>
>>>>>>> On 29 March 2011 15:06, Simon
>>> Urbanek<simon.urbanek at r-project.org>  wrote:
>>>>>>>>
>>>>>>>> On Mar 29, 2011, at 8:46 AM, Jon Clayden wrote:
>>>>>>>>
>>>>>>>>> Dear all,
>>>>>>>>>
>>>>>>>>> I see from some previous threads that support for
>>> 64-bit integers in R
>>>>>>>>> may be an aim for future versions, but in the meantime
>>> I'm wondering
>>>>>>>>> whether it is possible to read in integers of greater
>>> than 32 bits at
>>>>>>>>> all. Judging from ?readBin, it should be possible to
>>> read 8-byte
>>>>>>>>> integers to some degree, but it is clearly limited in
>>> practice by R's
>>>>>>>>> internally 32-bit integer type:
>>>>>>>>>
>>>>>>>>>> x<- as.raw(c(0,0,0,0,1,0,0,0))
>>>>>>>>>> (readBin(x,"integer",n=1,size=8,signed=F,endian="big"))
>>>>>>>>> [1] 16777216
>>>>>>>>>> x<- as.raw(c(0,0,0,1,0,0,0,0))
>>>>>>>>>> (readBin(x,"integer",n=1,size=8,signed=F,endian="big"))
>>>>>>>>> [1] 0
>>>>>>>>>
>>>>>>>>> For values that fit into 32 bits it works fine, but
>>> for larger values
>>>>>>>>> it fails. (I'm a bit surprised by the zero - should
>>> the value not be
>>>>>>>>> NA if it is out of range?
>>>>>>>>
>>>>>>>> No, it's not out of range - int is only 4 bytes so only
>>> 4 first bytes (respecting endianness order, hence LSB) are used.
>>>>>>>
>>>>>>> The fact remains that I ask for the value of an 8-byte
>>> integer and
>>>>>>> don't get it.
>>>>>>
>>>>>> I think you're misinterpreting the documentation:
>>>>>>
>>>>>>    If 'size' is specified and not the natural size of the object,
>>>>>>    each element of the vector is coerced to an appropriate type
>>>>>>    before being written or as it is read.
>>>>>>
>>>>>> The "integer" object type is defined as signed 32-bit in
>>> R, so if you ask for "8 bytes into object type integer", you
>>> get a coercion into that object type -- 32-bit signed integer
>>> -- as documented. I think the issue may come from the
>>> confusion of the object type "integer" with general "integer
>>> number" in mathematical sense that has no representation
>>> restrictions. (FWIW in C the "integer" type is "int" and it
>>> is 32-bit on all modern OSes regardless of platform - that's
>>> where the limitation comes from, it's not something R has made up).
>>>>>
>>>>> OK, but it still seems like there is a case for raising a
>>> warning. As
>>>>> it is there is no way to tell when reading an 8-byte integer from a
>>>>> file whether its value is really 0, or if it merely has 0 in its
>>>>> least-significant 4 bytes. If 99% of such stored numbers are below
>>>>> 2^31, one is going to need some extra logic to catch the other 1%
>>>>> where you (silently) get the wrong value. In essence, unless you're
>>>>> certain that you will never come across a number that actually uses
>>>>> the upper 4 bytes, you will always have to read it as two 4-byte
>>>>> numbers and check that the high-order one (which is endianness
>>>>> dependent, of course) is zero. A C-level sanity check seems more
>>>>> efficient and more helpful to me.
>>>>
>>>> Seems to me that the S-PLUS solution (output="double")
>>> would be a lot more useful.  I'd commit that if you write it;
>>> I don't think I'd commit the warning.
>>>>
>>>
>>> I was going to write some thing similar (idea = good, patch
>>> welcome ;)). My only worry is that the "output" argument is a
>>> bit misleading in that one could expect to use any
>>> combination of "input"/"output" which may be a maintenance
>>> nightmare. If I understand it correctly it's only a special
>>> case for integer input. I don't have S+ so can't say how they
>>> deal with that.
>>
>> In S+'s readBin the output argument can be
>> only double() or single() when what is double()
>> or single() (S+ still  has a real single
>> precision storage mode) and can be any
>> numeric type or logical when what is integer().
>>
>> The output=double() seemed like the only useful case.
>>
>> It does not warn when precision is lost in the 8-byte
>> integer to double conversion.  Perhaps it should.
>>
>> Bill Dunlap
>> Spotfire, TIBCO Software
>> wdunlap tibco.com
>>
>>>
>>> Cheers,
>>> Simon
>>>
>>>
>>>>
>>>>>
>>>>>>> Pretending that it's really only four bytes because of
>>>>>>> the limits of R's integer type isn't all that helpful. Perhaps a
>>>>>>> warning should be put out if the cast will affect the
>>> value of the
>>>>>>> result? It looks like the relevant lines in
>>> src/main/connections.c are
>>>>>>> 3689-3697 in the current alpha:
>>>>>>>
>>>>>>> #if SIZEOF_LONG == 8
>>>>>>>                  case sizeof(long):
>>>>>>>                      INTEGER(ans)[i] = (int)*((long *)buf);
>>>>>>>                      break;
>>>>>>> #elif SIZEOF_LONG_LONG == 8
>>>>>>>                  case sizeof(_lli_t):
>>>>>>>                      INTEGER(ans)[i] = (int)*((_lli_t *)buf);
>>>>>>>                      break;
>>>>>>> #endif
>>>>>>>
>>>>>>>>> ) The value can be represented as a double,
>>>>>>>>> though:
>>>>>>>>>
>>>>>>>>>> 4294967296
>>>>>>>>> [1] 4294967296
>>>>>>>>>
>>>>>>>>> I wouldn't expect readBin() to return a double if an
>>> integer was
>>>>>>>>> requested, but is there any way to get the correct
>>> value out of it?
>>>>>>>>
>>>>>>>> Trivially (for your unsigned big-endian case):
>>>>>>>>
>>>>>>>> y<- readBin(x, "integer", n=length(x)/4L, endian="big")
>>>>>>>> y<- ifelse(y<  0, 2^32 + y, y)
>>>>>>>> i<- seq(1,length(y),2)
>>>>>>>> y<- y[i] * 2^32 + y[i + 1L]
>>>>>>>
>>>>>>> Thanks for the code, but I'm not sure I would call that trivial,
>>>>>>> especially if one needs to cater for little endian and
>>> signed cases as
>>>>>>> well!
>>>>>>
>>>>>> I was saying for your case and it's trivial as in read as
>>> integers, convert to double precision and add.
>>>>>>
>>>>>>
>>>>>>> This is what I meant by reconstructing the number manually...
>>>>>>>
>>>>>>
>>>>>> You didn't say so - you were talking about reconstructing
>>> it from a raw vector which seems a lot more painful since you
>>> can't compute with enough precision on raw vectors.
>>>>>
>>>>> True - I should have been more specific. Sorry.
>>>>>
>>>>> Jon
>>>>>
>>>>> ______________________________________________
>>>>> R-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>
>>>>
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>>
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>