[Rd] Reading an "unsigned long long" using R readBin()

Bill Dunlap bill at insightful.com
Fri May 30 22:20:29 CEST 2008


On Fri, 30 May 2008, Prof Brian Ripley wrote:

> On Fri, 30 May 2008, Duncan Murdoch wrote:
>
> > On 5/30/2008 1:55 PM, Prof Brian Ripley wrote:
> >> Well, R has no unsigned quantities, so ultimately you can't actually do
> >> this.  But using what="int" and an appropriate 'size' (likely to be 8)
> >> shold read the numbers, wrapping around very large ones to be negative.
> >> (The usual trick of storing integers in numeric will lose accuracy, but
> >> might be better than nothing.)
> >
> > I think reading size 8 integers on 32 bit Windows returns signed 32 bit
> > integers, with values outside that range losing the high order bits, not just
> > accuracy.  At least that's what I see when I write the numbers 1:10 out as 4
> > byte integers, and read them as 8 byte integers:  I get 1 3 5 7 9.
>
> Yes, that's true for even larger ones.
>
> So to clarify: up to 2^31-1 should work, thereafter you will get the lower
> 32 bits and hence possibly a signed number.

When we wrote a version of readBin() for Splus 8.0 we added an
extra argument, output=, that specifies the type of S object
to put the result into.  The what= argument says what sort
of data is in the input file and by default output=what.
output="double" can be useful in this case, as a double can
store a 53 bit signed or unsigned integer without loss of
precision.  If the integer is bigger than 2^53-1, the double
stores its most significant 53 bits, which may be better
than truncating the thing.

E.g., I wrote a C program to write some unsigned long longs to
a file:
    #include <stdio.h>
    int main(int argc, char *argv[])
    {
	    unsigned long long data[7], one = 1ULL ;
	    data[0] = one ;
	    data[1] = (one<<31) - 1 ;
	    data[2] = (one<<31) + 1 ;
	    data[3] = (one<<32) - 1 ;
	    data[4] = (one<<32) + 1 ;
	    data[5] = (one<<52) + 1 ;
	    data[6] = (one<<54) + 1 ;
 	    (void)fwrite((void *)data, sizeof(data[0]), sizeof(data)/sizeof(data[0]), stdout) ;
	    return 0 ;
    }

od shows what it writes, as unsigned, signed, and hex
8 byte integers:
    % ./a.out|od --format u8
    0000000                    1           2147483647
    0000020           2147483649           4294967295
    0000040           4294967297     4503599627370497
    0000060    18014398509481985
    0000070
    % ./a.out | od --format d8
    0000000                    1           2147483647
    0000020           2147483649           4294967295
    0000040           4294967297     4503599627370497
    0000060    18014398509481985
    0000070
    % ./a.out | od --format x8
    0000000 0000000000000001 000000007fffffff
    0000020 0000000080000001 00000000ffffffff
    0000040 0000000100000001 0010000000000001
    0000060 0040000000000001
    0000070

and in 32-bit Splus I can read it with:
    > z<-readBin(pipe("./a.out", open="br"), what="integer", n=7,
              size=8, signed=FALSE, output="double")
    > print(z, digits=16)
    [1]                 1        2147483647        2147483649        4294967295
    [5]        4294967297  4503599627370497 18014398509481984
Note that it loses precision where z[7]>2^53.

Without the output="double" then the numbers > 2^32 would be
truncated and the signs would be wrong on ones between 2^31
anbd 2^32:
    > readBin(pipe("./a.out", open="br"), what="integer", n=7,
              size=8, signed=FALSE)
    [1]           1  2147483647 -2147483647          -1           1           1
    [7]           1
(That one gives the same result in R and Splus.)

What do folks think about having this option in R?

----------------------------------------------------------------------------
Bill Dunlap
Insightful Corporation
bill at insightful dot com

 "All statements in this message represent the opinions of the author and do
 not necessarily reflect Insightful Corporation policy or position."



More information about the R-devel mailing list