[Rd] Reading an "unsigned long long" using R readBin()

Fri May 30 22:20:29 CEST 2008

On Fri, 30 May 2008, Prof Brian Ripley wrote:

> On Fri, 30 May 2008, Duncan Murdoch wrote:
>
> > On 5/30/2008 1:55 PM, Prof Brian Ripley wrote:
> >> Well, R has no unsigned quantities, so ultimately you can't actually do
> >> this.  But using what="int" and an appropriate 'size' (likely to be 8)
> >> shold read the numbers, wrapping around very large ones to be negative.
> >> (The usual trick of storing integers in numeric will lose accuracy, but
> >> might be better than nothing.)
> >
> > I think reading size 8 integers on 32 bit Windows returns signed 32 bit
> > integers, with values outside that range losing the high order bits, not just
> > accuracy.  At least that's what I see when I write the numbers 1:10 out as 4
> > byte integers, and read them as 8 byte integers:  I get 1 3 5 7 9.
>
> Yes, that's true for even larger ones.
>
> So to clarify: up to 2^31-1 should work, thereafter you will get the lower
> 32 bits and hence possibly a signed number.

When we wrote a version of readBin() for Splus 8.0 we added an
extra argument, output=, that specifies the type of S object
to put the result into.  The what= argument says what sort
of data is in the input file and by default output=what.
output="double" can be useful in this case, as a double can
store a 53 bit signed or unsigned integer without loss of
precision.  If the integer is bigger than 2^53-1, the double
stores its most significant 53 bits, which may be better
than truncating the thing.

E.g., I wrote a C program to write some unsigned long longs to
a file:
    #include <stdio.h>
    int main(int argc, char *argv[])
    {
	    unsigned long long data[7], one = 1ULL ;
	    data[0] = one ;
	    data[1] = (one<<31) - 1 ;
	    data[2] = (one<<31) + 1 ;
	    data[3] = (one<<32) - 1 ;
	    data[4] = (one<<32) + 1 ;
	    data[5] = (one<<52) + 1 ;
	    data[6] = (one<<54) + 1 ;
 	    (void)fwrite((void *)data, sizeof(data[0]), sizeof(data)/sizeof(data[0]), stdout) ;
	    return 0 ;
    }

od shows what it writes, as unsigned, signed, and hex
8 byte integers:
    % ./a.out|od --format u8
    0000000                    1           2147483647
    0000020           2147483649           4294967295
    0000040           4294967297     4503599627370497
    0000060    18014398509481985
    0000070
    % ./a.out | od --format d8
    0000000                    1           2147483647
    0000020           2147483649           4294967295
    0000040           4294967297     4503599627370497
    0000060    18014398509481985
    0000070
    % ./a.out | od --format x8
    0000000 0000000000000001 000000007fffffff
    0000020 0000000080000001 00000000ffffffff
    0000040 0000000100000001 0010000000000001
    0000060 0040000000000001
    0000070

and in 32-bit Splus I can read it with:
    > z<-readBin(pipe("./a.out", open="br"), what="integer", n=7,
              size=8, signed=FALSE, output="double")
    > print(z, digits=16)
    [1]                 1        2147483647        2147483649        4294967295
    [5]        4294967297  4503599627370497 18014398509481984
Note that it loses precision where z[7]>2^53.

Without the output="double" then the numbers > 2^32 would be
truncated and the signs would be wrong on ones between 2^31
anbd 2^32:
    > readBin(pipe("./a.out", open="br"), what="integer", n=7,
              size=8, signed=FALSE)
    [1]           1  2147483647 -2147483647          -1           1           1
    [7]           1
(That one gives the same result in R and Splus.)

What do folks think about having this option in R?

----------------------------------------------------------------------------
Bill Dunlap
Insightful Corporation
bill at insightful dot com

 "All statements in this message represent the opinions of the author and do
 not necessarily reflect Insightful Corporation policy or position."