[R] rounding down with as.integer

Mike Miller mbmiller+l at gmail.com
Thu Jan 1 19:21:42 CET 2015


On Thu, 1 Jan 2015, Duncan Murdoch wrote:

> On 31/12/2014 8:44 PM, David Winsemius wrote:
>>
>> On Dec 31, 2014, at 3:24 PM, Mike Miller wrote:
>>
>>> This is probably a FAQ, and I don't really have a question about it, but I just ran across this in something I was working on:
>>>
>>>> as.integer(1000*1.003)
>>> [1] 1002
>>>
>>> I didn't expect it, but maybe I should have.  I guess it's about the machine precision added to the fact that as.integer always rounds down:
>>>
>>>
>>>> as.integer(1000*1.003 + 255 * .Machine$double.eps)
>>> [1] 1002
>>>
>>>> as.integer(1000*1.003 + 256 * .Machine$double.eps)
>>> [1] 1003
>>>
>>>
>>> This does it right...
>>>
>>>> as.integer( round( 1000*1.003 ) )
>>> [1] 1003
>>>
>>> ...but this seems to always give the same answer and it is a little faster in my application:
>>>
>>>> as.integer( 1000*1.003 + .1 )
>>> [1] 1003
>>>
>>>
>>> FYI - I'm reading in a long vector of numbers from a text file with no more than three digits to the right of the decimal.  I'm converting them to integers and saving them in binary format.
>>>
>>
>> So just add 0.0001 or even .0000001 to all of them and coerce to integer.
>
> I don't think the original problem was stated clearly, so I'm not sure 
> whether this is a solution, but it looks wrong to me.  If you want to 
> round to the nearest integer, why not use round() (without the 
> as.integer afterwards)?  Or if you really do want an integer, why add 
> 0.1 or 0.0001, why not add 0.5 before calling as.integer()?  This is the 
> classical way to implement round().
>
> To state the problem clearly, I'd like to know what result is expected 
> for any real number x.  Since R's numeric type only approximates the 
> real numbers we might not be able to get a perfect match, but at least 
> we could quantify how close we get.  Or is the input really character 
> data?  The original post mentioned reading numbers from a text file.


Maybe you'd like to know what I'm really doing.  I have 1600 text files 
each with up to 16,000 lines with 3100 numbers per line, delimited by a 
single space.  The numbers are between 0 and 2, inclusive, and they have 
up to three digits to the right of the decimal.  Every possible value in 
that range will occur in the data.  Some examples numbers: 0 1 2 0.325 
1.12 1.9.  I want to multiply by 1000 and store them as 16-bit integers 
(uint16).

I've been reading in the data like so:

> data <- scan( file=FILE, what=double(), nmax=3100*16000)

At first I tried making the integers like so:

> ptm <- proc.time() ; ints <- as.integer( 1000 * data ) ; proc.time()-ptm
    user  system elapsed
   0.187   0.387   0.574

I decided I should compare with the result I got using round():

> ptm <- proc.time() ; ints2 <- as.integer( round( 1000 * data ) ) ; proc.time()-ptm
    user  system elapsed
   1.595   0.757   2.352

It is a curious fact that only a few of the values from 0 to 2000 disagree 
between the two methods:

> table( ints2[ ints2 != ints ] )

  1001  1003  1005  1007  1009  1011  1013  1015  1017  1019  1021  1023
35651 27020 15993 11505  8967  7549  6885  6064  5512  4828  4533  4112

I understand that it's all about the problem of representing digital 
numbers in binary, but I still find some of the results a little 
surprising, like that list of numbers from the table() output.  For 
another example:

> 1000+3 - 1000*(1+3/1000)
[1] 1.136868e-13

> 3 - 1000*(0+3/1000)
[1] 0

> 2000+3 - 1000*(2+3/1000)
[1] 0

See what I mean?  So there is something special about the numbers around 
1000.

Back to the quesion at hand:  I can avoid use of round() and speed things 
up a little bit by just adding a small number after multiplying by 1000:

> ptm <- proc.time() ; R3 <- as.integer( 1000 * data + .1 ) ; proc.time()-ptm
    user  system elapsed
   0.224   0.594   0.818

You point out that adding .5 makes sense.  That is probably a better idea 
and I should take that approach under most conditions, but in this case we 
can add anything between 2e-13 and about 0.99999999999 and always get the 
same answer.  We also have to remember that if a number might be negative 
(not a problem for me in this application), we need to subtract 0.5 
instead of adding it.

Anyway, right now this is what I'm actually doing:

> con <- file( paste0(FILE, ".uint16"), "wb" )
> ptm <- proc.time() ; writeBin( as.integer( 1000 * scan( file=FILE, what=double(), nmax=3100*16000 ) + .1 ), con, size=2 ) ; proc.time()-ptm
Read 48013406 items
    user  system elapsed
  10.263   0.733  10.991
> close(con)

By the way, writeBin() is something that I learned about here, from you, 
Duncan.  Thanks for that, too.

Mike

-- 
Michael B. Miller, Ph.D.
University of Minnesota
http://scholar.google.com/citations?user=EV_phq4AAAAJ



More information about the R-help mailing list