[R] Rounding problem R vs Excel

Wed Jun 4 07:24:28 CEST 2003

On Tue, 2003-06-03 at 08:32, Duncan Murdoch wrote: 
> On Tue, 3 Jun 2003 09:49:44 +0100, you wrote in message
> <00b701c329ad$14ef6460$ad002850 at FSSFQCV7BGDVED>:
> 
> >Duncan
> >If the numbers are not represently exactly how does R resolve problems like
> >the one below? Is there something that needs to be set up in the R
> >environment like the number of significant figures?
> >
> >> x<-4.145*100+0.5
> >> x
> >[1] 415
> >> floor(x)
> >[1] 414
> 
> R doesn't do anything to resolve this problem; it's just the way the
> IEEE standard floating point formats work.  In Excel 97, 4.145*100+0.5
> is exactly equal to 415; I would guess this is either because they use
> a binary coded decimal format instead of the IEEE floating point
> types, or they round results internally in some way.  R doesn't
> support BCD formats, and doesn't do tricky rounding behind your back.
> You get what you ask for.
> 
> If you want the calculation above to give you exactly 415, the
> standard workaround in languages without BCD formats is to work in
> some decimal multiple of the actual numbers you're interested in, e.g.
> 10000.  Then you would store 4.145 as 41450, multiply by 1000000 (i.e.
> 100*10000) and divide by 10000 to give 4145000, and add 5000, to give
> 4150000.  All of these numbers are exactly representable in double
> precision floating point types, because they are all integers with
> fewer than 53 bits in their binary representations.  
> 
> Doing this means you need to change the definitions of *, /, ^, and
> lots of other low level functions, but + and - work in the usual way.
> It might be an interesting project to write a package that does all of
> this.
> 
> Duncan Murdoch

In Excel, the IEEE standard (754) is used to internally represent
floats. A MS-KB article on this is here:

http://support.microsoft.com/default.aspx?scid=kb;[LN];214118

Another, more detailed, is here:

http://support.microsoft.com/default.aspx?scid=kb;EN-US;78113

What is curious about this situation, and apropos to Prof. Ripley's
comments about the difference between internal representation, rounding
and displayed values, is the following information. Note how the results
of cell calculations differ between Excel, OpenOffice.org Calc and
Gnumeric. In each case, I use a format setting of 20 digits after the
decimal place with scientific notation. This is best read with a fixed
width font.

OOo Calc 1.0.2 and 1.1 Beta2:

Cell Formula          Value
= 4.145 * 100 + 0.5   4.15000000000000000000E+02
= 0.5 - 0.4 - 0.1     0.00000000000000000000E+00
=(0.5 - 0.4 - 0.1)    0.00000000000000000000E+00

Excel 2002 (XP):

Cell Formula          Value
= 4.145 * 100 + 0.5   4.15000000000000000000E+02
= 0.5 - 0.4 - 0.1     0.00000000000000000000E+00
=(0.5 - 0.4 - 0.1)    -2.77555756156289000000E-17

Gnumeric 1.0.12:

Cell Formula          Value
= 4.145 * 100 + 0.5   +4.14999999999999943157E+02
= 0.5 - 0.4 - 0.1     -2.77555756156289135106E-17
*Gnumeric does not appear to allow the surrounding parens.

For comparison, R 1.7.1 Beta under RH 9 and WinXP:

> print(4.145 * 100 + 0.5, digits = 20)
[1] 414.99999999999994
> formatC(4.145 * 100 + 0.5, format = "E", digits = 20)
[1] "4.14999999999999943157E+02"

> print(0.5 - 0.4 - 0.1, digits = 20)
[1] -2.775557561562891e-17
> formatC(0.5 - 0.4 - 0.1, format = "E", digits = 20)
[1] "-2.77555756156289135106E-17"

What is interesting is the change in the displayed value in Excel when
the second formula is surrounded by parens (which I found purely by
accident). This would suggest that there may be something going on in
the parsing of the cell formula that affects the calculation and
displayed value. Also note the precision of the resultant number.

Presuming that each of the spreadsheet programs are using IEEE standard
internal representation, there are clearly differences in the way in
which each visually displays the values, both by default and when
explicitly formatted.

Using the following cell formula:

= 1.333 + 1.225 - 1.333 - 1.225

there is an indication in the second MS-KB article above, that Excel 97
introduced an "optimization" dealing with results near zero. "The
example above when performed in Excel 97 and later correctly displays 0
or 0.000000000000000E+00 in scientific notation." 

whereas 

"Rather than displaying 0, Excel 95 displays -2.22044604925031E-16."

The terms "optimization" and "correctly displays" are an interesting
choice of words.

I have a post to one of the OOo forums regarding my inability to
replicate the IEEE precision issues in Calc under any circumstances
using the three formulas and any numeric formatting options. It may be
that the OOo folks copied the MS Excel "optimization" with no override.

FYI...the IEEE has a reference site for the standard here:

http://grouper.ieee.org/groups/754/

HTH,

Marc Schwartz