[R] rounding down with as.integer

Thu Jan 1 20:58:02 CET 2015

I'd have to say thanks, but no thanks, to that one!  ;-)  The problem is 
that it will take a long time and it will give the same answer.

The first time I did this kind of thing, a year or two ago, I manipulated 
the text data to produce integers before putting the data into R.  The 
data were a little different -- already zero padded with three digits to 
the right of the decimal and one to the left, so all I had to do was drop 
the decimal point.  The as.integer(1000*x+.5) method is very fast and it 
works great.

I could have done that this time, but I was also saving to other formats, 
so I had the data already in the format I described.

Mike

On Thu, 1 Jan 2015, Richard M. Heiberger wrote:

> Interesting.  Following someone on this list today the goal is input
> the data correctly.
> My inclination would be to read the file as text, pad each number to
> the right, drop the decimal point,
> and then read it as an integer.
> 0 1 2 0.325 1.12 1.9
> 0.000 1.000 2.000 0.325 1.120 1.900
> 0000 1000 2000 0325 1120 1900
>
> The pad step is the interesting step.
>
> ## 0 1 2 0.325 1.12 1.9
> ## 0.000 1.000 2.000 0.325 1.120 1.900
> ## 0000 1000 2000 0325 1120 1900
>
> x.in <- scan(text="
> 0 1 2 0.325 1.12 1.9 1.
> ", what="")
>
> padding <- c(".000", "000", "00", "0", "")
>
> x.pad <- paste(x.in, padding[nchar(x.in)], sep="")
>
> x.nodot <- sub(".", "", x.pad, fixed=TRUE)
>
> x <- as.integer(x.nodot)
>
>
> Rich
>
>
> On Thu, Jan 1, 2015 at 1:21 PM, Mike Miller <mbmiller+l at gmail.com> wrote:
>> On Thu, 1 Jan 2015, Duncan Murdoch wrote:
>>
>>> On 31/12/2014 8:44 PM, David Winsemius wrote:
>>>>
>>>>
>>>> On Dec 31, 2014, at 3:24 PM, Mike Miller wrote:
>>>>
>>>>> This is probably a FAQ, and I don't really have a question about it, but
>>>>> I just ran across this in something I was working on:
>>>>>
>>>>>> as.integer(1000*1.003)
>>>>>
>>>>> [1] 1002
>>>>>
>>>>> I didn't expect it, but maybe I should have.  I guess it's about the
>>>>> machine precision added to the fact that as.integer always rounds down:
>>>>>
>>>>>
>>>>>> as.integer(1000*1.003 + 255 * .Machine$double.eps)
>>>>>
>>>>> [1] 1002
>>>>>
>>>>>> as.integer(1000*1.003 + 256 * .Machine$double.eps)
>>>>>
>>>>> [1] 1003
>>>>>
>>>>>
>>>>> This does it right...
>>>>>
>>>>>> as.integer( round( 1000*1.003 ) )
>>>>>
>>>>> [1] 1003
>>>>>
>>>>> ...but this seems to always give the same answer and it is a little
>>>>> faster in my application:
>>>>>
>>>>>> as.integer( 1000*1.003 + .1 )
>>>>>
>>>>> [1] 1003
>>>>>
>>>>>
>>>>> FYI - I'm reading in a long vector of numbers from a text file with no
>>>>> more than three digits to the right of the decimal.  I'm converting them to
>>>>> integers and saving them in binary format.
>>>>>
>>>>
>>>> So just add 0.0001 or even .0000001 to all of them and coerce to integer.
>>>
>>>
>>> I don't think the original problem was stated clearly, so I'm not sure
>>> whether this is a solution, but it looks wrong to me.  If you want to round
>>> to the nearest integer, why not use round() (without the as.integer
>>> afterwards)?  Or if you really do want an integer, why add 0.1 or 0.0001,
>>> why not add 0.5 before calling as.integer()?  This is the classical way to
>>> implement round().
>>>
>>> To state the problem clearly, I'd like to know what result is expected for
>>> any real number x.  Since R's numeric type only approximates the real
>>> numbers we might not be able to get a perfect match, but at least we could
>>> quantify how close we get.  Or is the input really character data?  The
>>> original post mentioned reading numbers from a text file.
>>
>>
>>
>> Maybe you'd like to know what I'm really doing.  I have 1600 text files each
>> with up to 16,000 lines with 3100 numbers per line, delimited by a single
>> space.  The numbers are between 0 and 2, inclusive, and they have up to
>> three digits to the right of the decimal.  Every possible value in that
>> range will occur in the data.  Some examples numbers: 0 1 2 0.325 1.12 1.9.
>> I want to multiply by 1000 and store them as 16-bit integers (uint16).
>>
>> I've been reading in the data like so:
>>
>>> data <- scan( file=FILE, what=double(), nmax=3100*16000)
>>
>>
>> At first I tried making the integers like so:
>>
>>> ptm <- proc.time() ; ints <- as.integer( 1000 * data ) ; proc.time()-ptm
>>
>>    user  system elapsed
>>   0.187   0.387   0.574
>>
>> I decided I should compare with the result I got using round():
>>
>>> ptm <- proc.time() ; ints2 <- as.integer( round( 1000 * data ) ) ;
>>> proc.time()-ptm
>>
>>    user  system elapsed
>>   1.595   0.757   2.352
>>
>> It is a curious fact that only a few of the values from 0 to 2000 disagree
>> between the two methods:
>>
>>> table( ints2[ ints2 != ints ] )
>>
>>
>>  1001  1003  1005  1007  1009  1011  1013  1015  1017  1019  1021  1023
>> 35651 27020 15993 11505  8967  7549  6885  6064  5512  4828  4533  4112
>>
>> I understand that it's all about the problem of representing digital numbers
>> in binary, but I still find some of the results a little surprising, like
>> that list of numbers from the table() output.  For another example:
>>
>>> 1000+3 - 1000*(1+3/1000)
>>
>> [1] 1.136868e-13
>>
>>> 3 - 1000*(0+3/1000)
>>
>> [1] 0
>>
>>> 2000+3 - 1000*(2+3/1000)
>>
>> [1] 0
>>
>> See what I mean?  So there is something special about the numbers around
>> 1000.
>>
>> Back to the quesion at hand:  I can avoid use of round() and speed things up
>> a little bit by just adding a small number after multiplying by 1000:
>>
>>> ptm <- proc.time() ; R3 <- as.integer( 1000 * data + .1 ) ;
>>> proc.time()-ptm
>>
>>    user  system elapsed
>>   0.224   0.594   0.818
>>
>> You point out that adding .5 makes sense.  That is probably a better idea
>> and I should take that approach under most conditions, but in this case we
>> can add anything between 2e-13 and about 0.99999999999 and always get the
>> same answer.  We also have to remember that if a number might be negative
>> (not a problem for me in this application), we need to subtract 0.5 instead
>> of adding it.
>>
>> Anyway, right now this is what I'm actually doing:
>>
>>> con <- file( paste0(FILE, ".uint16"), "wb" )
>>> ptm <- proc.time() ; writeBin( as.integer( 1000 * scan( file=FILE,
>>> what=double(), nmax=3100*16000 ) + .1 ), con, size=2 ) ; proc.time()-ptm
>>
>> Read 48013406 items
>>    user  system elapsed
>>  10.263   0.733  10.991
>>>
>>> close(con)
>>
>>
>> By the way, writeBin() is something that I learned about here, from you,
>> Duncan.  Thanks for that, too.
>>
>> Mike
>>
>> --
>> Michael B. Miller, Ph.D.
>> University of Minnesota
>> http://scholar.google.com/citations?user=EV_phq4AAAAJ
>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>