[Rd] R 2.5.0 refuses to print enough digits to recover exact floating point values

Tue Jun 26 17:03:24 CEST 2007

I would like to reply to the following email from May in the thread
  [Rd] R 2.5.0 refuses to print enough digits to recover exact floating point values

On Wed, May 23, 2007 at 06:32:36PM +0100, Prof Brian Ripley wrote:
> I think this is a bug in the MacOS X runtime.  I've checked the C99
> standard, and can see no limits on the precision that you should be able
> to specify to printf.
> 
> Not that some protection against such OSes would come amiss.
> 
> However, the C99 standard does make clear that [sf]printf is not required 
> (even as 'recommended practice') to be accurate to more than *_DIG places, 
> which as ?as.character has pointed out is 15 on the machines R runs on.

Let me use
  http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1124.pdf
In Section 7.19.6, Recommended practice, paragraph 13, it is specified
that rounding to DECIMAL_DIG should be done correctly.

In Annex F.5 Binary-decimal conversion, it is explicitly stated that
conversion from the widest supported IEC 60559 format to decimal
with DECIMAL_DIG digits and back is the identity function. In footnote 305,
it is specified that if double (53 bit precision) is the widest format
supported, then DECIMAL_DIG >= 17.

R uses double precision as default, so I think DECIMAL_DIG = 17 is a
reasonable value. This is the minimum number of decimal digits, for which
the conversion
   double -> string -> double
is value preserving, if rounding is correct.

DBL_DIG = 15 is another constant, which is the maximum number of digits,
for which the conversion
   string -> double -> string
may be done value preserving.

> It really is the case that writing out a number to > 15 significant digits 
> and reading it back in again is not required to give exactly the same 
> IEC60559 (sic) number, and furthermore there are R platforms for which 
> this does not happen.  What Mr Weinberg claimed is 'now impossible' never 
> was possible in general (and he seems to be berating the R developers for 
> not striving to do better than the C standard requires of OSes).  In fact, 
> I believe this to be impossible unless you have access to extended 
> precsion arithmetic, as otherwise printf/scanf have to use the same 
> arithmetic as the computations.
> 
> This is why R supports transfer of floating-point numbers via readBin and 
> friends, and uses a binary format itself for save() (by default).
> 
> I should also say that any algorithm that depends on that level of details 
> in the numbers will depend on the compiler used and optimization level and 
> so on.  Don't expect repeatability to that level even with binary data 
> unless you (are allowed to) never apply bugfixes to your system.

Repeatability can hardly be achieved among different R installations, due
to the reasons you explain. However, repeatability of a computation 
within the same installation may be achieved and may be sometimes useful.
For example it may be useful for debugging, similarly as set.seed. Saving
the data in binary is a possible approach for this, however, decimal numbers
in a text may serve this purpose equally well, if they are precise enough.

I would like to ask a question concerning the function scientific
in src/main/format.c, which is called from formatReal in the same file.
Function scientific, besides other things, determines,
whether a smaller number of significant digits than R_print.digits
is sufficient to get a decimal number with relative error
at most max(10^-R_print.digits,2*DBL_EPSILON), where DBL_EPSILON=2^-52.

For simplicity, let me consider only the case, when
R_print.digits = DBL_DIG (which is 15). Then the bound for the relative
error is 1e-15, since 1e-15 > 2*2^-52.

There are two error bounds in consideration:
(1) the absolute error of the rounded mantissa (significand) is at most 5e-15,
(2) the relative error of the rounded number is at most 1e-15.

It is natural to consider also (1), since R_print.digits = 15 and 5e-15 is
the maximum absolute error of the mantissa for correct rounding
to 15 significant digits.

If the mantissa is in [1,5], then (2) => (1).
Hence, for these mantissas, function scientific should suggest a smaller
number of digits than 15 only if the result is as accurate as rounding
to 15 digits.

On the other hand, if the mantissa is in (5,10), then (2) does not imply (1).
Hence, here we sometimes do not get the full precision 15 digit numbers.
Is this behaviour the intention?

This has, for example, the following consequence: if a 15-digit number
is read into R using read.table and then written back to a text using
write.table, we need not get the same result.

For testing, I use the following platforms:

      FLAGS means CFLAGS,FFLAGS,CXXFLAGS,FCFLAGS
      on Linuxes, R version is 2.5.1 RC (2007-06-23 r42041),
      on Windows, R version is binary distribution of R-2.5.1 (RC).

  [1] AMD Athlon(tm) 64 Processor 3500+, SUSE 10.1,
      gcc 4.1.0, FLAGS = -g -O2 -march=pentium4 -mfpmath=sse

  [2] AMD Athlon(tm) 64 Processor 3500+, SUSE 10.1,
      gcc 4.1.0, FLAGS = default

  [3] Intel Xeon processor, Fedora Core 2,
      gcc 3.3.3, FLAGS -g -O2 -march=pentium4 -mfpmath=sse

  [4] Intel Xeon processor, Fedora Core 2,
      gcc 3.3.3, FLAGS = default

  [5] Pentium III, SuSE 10.0,
      gcc 4.0.2, FLAGS = -g -O2 -ffloat-store

  [6] Pentium III, SuSE 10.0,
      gcc 4.0.2, FLAGS = default

  [7] Pentium III, Windows XP.

On all 7 platforms, the problem with reproducing 15-digit numbers
appears for example for the following numbers

  a <- read.table(stdin());
  9.14363081376609
  9.15664585780591
  9.21585022237569
  9.23994602039041
  9.32750736234049
  9.35423342635191
  9.44884219569079
  9.47261017210051

  write.table(a,file="");
  # "V1"
  # "1" 9.1436308137661
  # "2" 9.1566458578059
  # "3" 9.2158502223757
  # "4" 9.2399460203904
  # "5" 9.3275073623405
  # "6" 9.3542334263519
  # "7" 9.4488421956908
  # "8" 9.4726101721005

On platforms [1,3,5,7], the problem appears also for smaller mantissas

  a <- read.table(stdin());
  7.57660233100411
  7.61904762387919
  8.05515628983101
  8.07459617374249

  write.table(a,file="");
  # "V1"
  # "1" 7.5766023310041
  # "2" 7.6190476238792
  # "3" 8.055156289831
  # "4" 8.0745961737425

An alternative approach would be to print the value with R_print.digits
(at most 17) precision and eliminate trailing zeros from the mantissa.
This would guarantee (1) for 15 digits.

More generally, the absolute error of the mantissa would
be at most 5*10^-R_print.digits and if the number may be represented
with smaller number of digits with error of the mantissa
less than 5*10^-R_print.digits, such a shorter representation
would be generated.

For functions, which format each value in a vector separately (as.character,
cat, write.table), removing trailing zeros may be done already in sprintf.
For R function print, which uses the same number of decimal positions
for all numbers in a vector, the string representation of all values
in a vector has to be seen before a decision on the number of removed
trailing zeros is made.

Also, I would like to point out that due to rounding errors in function
scientific, the actual errors sometimes exceed both bounds (1) and (2).
To demonstrate this, use the following:

  vect <- function(z) { v <- strsplit(z,"")[[1]]; as.integer(v[-2]); }
  diff <- function(x,y) { x <- vect(x); y <- vect(y);
      y <- c(y,rep(0,times=length(x)-length(y))); # assuming length(x) >= length(y)
      abs(sum((x - y)*10^-(1:length(x) - 1))); }

  x <- c(
  5.1767789744225927e+21,
  8.8010507503190114e+22,
  8.0551457236914104e+23,
  6.1185028214373079e+24,
  4.1491936503345055e+25,
  9.4921247662789880e+25);
  y <- as.character(x);
  mant.y <- sub("e.*","",y);
  z <- formatC(x,digits=21,width=-1,format="e");
  mant.z <- sub("e.*","",z);

  err <- matrix(nrow=length(x),ncol=2);
  for (i in 1:length(x)) {
      err[i,1] <- diff(mant.z[i],mant.y[i]);
      err[i,2] <- err[i,1]/as.double(mant.z[i]);
  }

  data.frame(formatC=z,as.character=y,abs.err=err[,1],rel.err=err[,2]);

On platforms [1,3,5], the output is (abs.err is absolute error of the mantissa):

                        formatC        as.character      abs.err      rel.err
  1 5.176778974422592651264e+21 5.1767789744226e+21 7.348736e-15 1.419558e-15
  2 8.801050750319011351757e+22  8.801050750319e+22 1.135176e-14 1.289818e-15
  3 8.055145723691410401526e+23 8.0551457236914e+23 1.040153e-14 1.291290e-15
  4 6.118502821437307892007e+24 6.1185028214373e+24 7.892007e-15 1.289859e-15
  5 4.149193650334505529822e+25 4.1491936503345e+25 5.529822e-15 1.332746e-15
  6 9.492124766278988034841e+25 9.4921247662790e+25 1.196516e-14 1.260535e-15

On platforms [2,4,6], as.character works correctly on these examples.

On [7] (windows), the output is

                        formatC         as.character      abs.err      rel.err
  1 5.176778974422592651157e+21  5.1767789744226e+21 7.348843e-15 1.419578e-15
  2 8.801050750319011351831e+22   8.801050750319e+22 1.135183e-14 1.289827e-15
  3 8.055145723691410400945e+23  8.0551457236914e+23 1.040095e-14 1.291217e-15
  4 6.118502821437307892093e+24  6.1185028214373e+24 7.892093e-15 1.289873e-15
  5 4.149193650334505529750e+25 4.14919365033451e+25 4.470250e-15 1.077378e-15
  6 9.492124766278988034877e+25  9.4921247662790e+25 1.196512e-14 1.260532e-15

formatC has smaller precision here, so the errors in the table are calculated
with smaller precision. However, besides x[5], the actual errors of the conversion
are the same as on platforms [1,3,5], since the numbers in x are the same
(verified using save/load in binary format) and besides x[5], also the result of
as.character is the same. The number x[5] is converted correctly.

All the best, Petr.