[Rd] How to handle INT8 data

Hadley Wickham h.wickham at gmail.com
Sat Jan 21 17:56:31 CET 2017


To summarise this thread, there are basically three ways of handling int64 in R:

* coerce to character
* coerce to double
* store in double

There is no ideal solution, and each have pros and cons that I've
attempted to summarise below.

## Coerce to character

This is the easiest approach if the data is used as identifiers. It
will have some performance drawbacks when loading and will require
additional memory. It should not have negative performance
implications once the data has been loaded because R has a global
string pool so string comparisons only require a single pointer
comparison (assuming they have the same encoding)

## Coerce to double

This is the easiest approach if your integers are in the range
[-(2^53), 2^53] or you can tolerate some minor loss of precision.

## Store in a double

This technique takes advantage of the fact that doubles and int64s are
the same size, so you can store the binary representation of an int64
in a double. This will effectively be garbage if you treat the vector
as if it is a double, so it requires adding an S3 class and overriding
every generic function with a custom method. Not all functions are
generic, and internal C code will not know about the special class, so
this has the danger of code silently interpreting the data
incorrectly.

This is the approach taken by the bit64 package (and, I believe, the
int64 package, but since that's been archived it's not worth
considering.

Hadley

On Fri, Jan 20, 2017 at 9:19 AM, Gabriel Becker <gmbecker at ucdavis.edu> wrote:
> I am not on R-core, so cannot speak to future plans to internally support
> int8 (though my impression is that there aren't any, at least none that are
> close to fruition).
>
> The standard way of dealing with whole numbers too big to fit in an integer
> is to put them in a numeric (double down in C land). this can represent
> integers up to 2^53 without loss of precision see (
> http://stackoverflow.com/questions/1848700/biggest-integer-that-can-be-stored-in-a-double).
> This is how long vector indices are (currently) implemented in R. If it's
> good enough for indices it's probably good enough for whatever you need
> them for.
>
> Hope that helps.
>
> ~G
>
>
> On Fri, Jan 20, 2017 at 6:33 AM, Nicolas Paris <nicolas.paris at aphp.fr>
> wrote:
>
>> Hello r users,
>>
>> I have to deal with int8 data with R. AFAIK  R does only handle int4
>> with `as.integer` function [1]. I wonder:
>> 1. what is the better approach to handle int8 ? `as.character` ?
>> `as.numeric` ?
>> 2. is there any plan to handle int8 in the future ? As you might know,
>> int4 is to small to deal with earth population right now.
>>
>> Thanks for you ideas,
>>
>> int8 eg:
>>
>>      human_id
>> ----------------------
>>  -1311071933951566764
>>  -4708675461424073238
>>  -6865005668390999818
>>   5578000650960353108
>>  -3219674686933841021
>>  -6469229889308771589
>>   -606871692563545028
>>  -8199987422425699249
>>   -463287495999648233
>>   7675955260644241951
>>
>> reference:
>> 1. https://www.r-bloggers.com/r-in-a-64-bit-world/
>>
>> --
>> Nicolas PARIS
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>
>
> --
> Gabriel Becker, PhD
> Associate Scientist (Bioinformatics)
> Genentech Research
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
http://hadley.nz



More information about the R-devel mailing list