# [R] Precision in R

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Mon Feb 26 09:21:12 CET 2018

```On Sun, 25 Feb 2018, Iuri Gavronski wrote:

> Hi,
>
> Why sum() on a 10-item vector produces a different value than its
> counterpart on a 2-item vector? I understand the problems related to
> the arithmetic precision in storing decimal numbers in binary format,
> but shouldn't the errors be equal regardless of the method used?

Since you understand the problems, why are you asking? More to the point,
why are you asking as if this was a property of R rather than a property
of all typical IEEE754 floating point implementations?

For others Googling this: short answer is don't make your program behavior
depend on the least significant digits... they are not reliable. Read
about the difference between double precision (53 bits or about 15 digits
) and extended precision (63 bits or about 18 digits for Intel math
forum (e.g. ) since this issue can crop up in practically any
programming language.

> See my example:

Examples are good. Diving into numerical analysis theory not so much.

>> options(digits=22)

Don't be misleading... you are not going to get 22digits with standard
numeric types. Note that you get 17 digits below, and two of those are
artifacts pulled from the artificially-expanded extended-precision values
in the coprocessor... the numbers in memory are not that precise. The best
that can be said for using 17digit printed values is that you have the
best shot at being able to reproduce the actual 15 digits of double
precision in a different program or instance of R if you keep them.

>> x=rep(.1,10)
>> x
>  0.10000000000000001 0.10000000000000001 0.10000000000000001
>  0.10000000000000001 0.10000000000000001 0.10000000000000001
>  0.10000000000000001 0.10000000000000001 0.10000000000000001
>  0.10000000000000001

Sitting in memory (not extended precision!)

>> sum(x)
>  1

sum() is vectorized (compact compiled C code)... the intermediate values
can stay in the coprocessor at extended precision. Only the result is
trimmed back to double precision.

>> y=0
>> for (i in 1:10) {y=sum(y+x[i])}

Why are you calling the sum function on a scalar value? The addition has

The actual intermediate values in this for loop get stored in RAM at
double precision and get re-expanded to extended precision each time
an addition occurs and then rounded again when the addition is done.
That is a lot of rounding compared to the sum function.

>> y
>  0.99999999999999989
>> y*10^6
>  999999.99999999988

Note that 10 = 2*5 = b'0010' * b'0101' is not just a power of 2 so both
the binary mantissa and binary exponent are getting modified when you
multiply by powers of 10, so the rounding may be different for various
powers of 10.

>> sum(x)*10^6
>  1e+06
>> z=.1+.1+.1+.1+.1+.1+.1+.1+.1+.1

More double precision.

>> z
>  0.99999999999999989

Similar to for loop.

Don't rely on the last few digits to be stable. Take a course in numerical
analysis if you just cannot take my word for it that those last few digits
are going wander off a bit no matter what you do.

 https://en.wikipedia.org/wiki/IEEE_754
 https://en.wikipedia.org/wiki/Extended_precision
 https://cs.stackexchange.com/search?q=ieee+754

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...