[R] strangely long floating point with write.table()

Sat Mar 15 20:10:22 CET 2014

Thanks for the ideas.  It is great to have such skilled assistance with 
this issue.  That said, I don't think we've solved this one, yet.

Looking back at where my numbers came from, I found that I had read in 
integers from a file, divided by 1000, then (critically) subtracted those 
numbers from 2.  It turns out that the important part seems to be the 
subtraction, not the data source.

It isn't necessary to read in the data to get the effect.  Here is a 
simple example:

write.table(c(1,2)-c(0.995,1.995), file="data.txt", row.names=F, col.names=F)

$ cat data.txt
0.005
0.00499999999999989

Here is another simple example that uses seq() and does not require 
reading in data.  As you can see, the output for both commands should be 
the same, but there is a big difference in how the numbers are represented 
in the output.  What causes the inconsistency within and between these two 
output files?

> write.table(1-seq(0.995,0.840,-.005), file="data1.txt", row.names=F, col.names=F)
> write.table(2-seq(1.995,1.840,-.005), file="data2.txt", row.names=F, col.names=F)

$ head -33 data[12].txt
==> data1.txt <==
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05
0.055
0.0600000000000001
0.0649999999999999
0.0700000000000001
0.075
0.08
0.085
0.09
0.095
0.1
0.105
0.11
0.115
0.12
0.125
0.13
0.135
0.14
0.145
0.15
0.155
0.16

==> data2.txt <==
0.00499999999999989
0.00999999999999979
0.0149999999999999
0.0199999999999998
0.0249999999999999
0.0299999999999998
0.0349999999999999
0.0399999999999998
0.0449999999999999
0.0499999999999998
0.0549999999999999
0.0599999999999998
0.0649999999999999
0.0699999999999998
0.075
0.0799999999999998
0.085
0.0899999999999999
0.095
0.0999999999999999
0.105
0.11
0.115
0.12
0.125
0.13
0.135
0.14
0.145
0.15
0.155
0.16

Importantly, if I do this...

write.table(seq(0.005,0.160,.005), file="data.txt", row.names=F, col.names=F)

...I'm producing all the same values, but no number in the output file 
exceeds three digits to the right of the decimal.

Thanks again for all of the helpful comments and ideas.

Best,
Mike

-- 
Michael B. Miller, Ph.D.
Minnesota Center for Twin and Family Research
Department of Psychology
University of Minnesota
http://scholar.google.com/citations?user=EV_phq4AAAAJ

On Sat, 15 Mar 2014, Duncan Murdoch wrote:

> On 14-03-14 11:03 PM, Mike Miller wrote:
>> On Fri, 14 Mar 2014, Duncan Murdoch wrote:
>> 
>>> On 14-03-14 8:59 PM, Mike Miller wrote:
>>>> What I'm using:
>>>> 
>>>> R version 3.0.1 (2013-05-16) -- "Good Sport"
>>>> Copyright (C) 2013 The R Foundation for Statistical Computing
>>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>> 
>>> That's not current, but it's not very old...
>>> 
>>>> According to some docs, options(digits) controls numerical precision in
>>>> output of write.table().  I'm using the default value for digits:
>>>> 
>>>>> getOption("digits")
>>>> [1] 7
>>>> 
>>>> I have a bunch of numbers in a data frame that are only a few digits to
>>>> the right of the decimal:
>>> 
>>> That's not enough to reproduce this.  Put together a self-contained
>>> reproducible example if you're wondering why something behaves as it
>>> does. With just a bunch of output, you'll just get uninformed guesses.
>> 
>> 
>> Thanks for the tip.  Here's what I've done:
>> 
>>> data2 <- data[c(94,120),c(18,20,21)]
>
> Thanks, I got the data2.Rdata file.  Peter was right, you don't have what you 
> think you have in that dataframe.  See below.
>
>>> save(data2, file="data2.Rdata")
>>> q("no")
>> 
>> $ R
>>> load("data2.Rdata")
>>> data2
>>         V18   V20      V21
>> 94  0.008 0.008 0.000064
>> 120 0.023 0.023 0.000529
>
> I'll create a dataframe that looks like yours:
>
>> data3 <- data.frame(V18=c(0.008, 0.023), V20=c(0.008, 0.023), 
> V21=c(0.000064, 0.000529))
>> data3
>    V18   V20      V21
> 1 0.008 0.008 0.000064
> 2 0.023 0.023 0.000529
>
>
> But it's not the same:
>
>> data2-data3
>              V18           V20           V21
> 94   6.938894e-18  6.938894e-18  1.219727e-19
> 120 -9.020562e-17 -9.020562e-17 -4.119968e-18
>
> I can't tell where these errors crept in; they are likely there in your 
> "data" object, which you didn't give us.  I'd guess as Peter did that your 
> numbers are the results of computations that introduced rounding error.
>
> Duncan Murdoch
>
>>> write.table(data2, file="data2.txt", sep="\t", row.names=F, col.names=F)
>> 
>> $ cat data2.txt
>> 0.00800000000000001     0.00800000000000001     6.40000000000001e-05
>> 0.0229999999999999      0.0229999999999999      0.000528999999999996
>> 
>> The data2.Rdata file is attached to this message.
>> 
>> I guess that is enough to reproduce this exact finding.  I don't know how
>> it works in general.
>> 
>> I don't have a newer version of R available right now.  It did the same
>> thing on an older version (2.15.1).
>> 
>> Interestingly, on a different machine with an even older version (2.12.2)
>> I see something a little different:
>> 
>> 0.008   0.008   6.40000000000001e-05
>> 0.0229999999999999      0.0229999999999999      0.000528999999999996
>> 
>> Best,
>> Mike
>> 
>