[Rd] print(...,digits=2) behavior

Wed Feb 9 13:55:59 CET 2011

On Mon, Feb 07, 2011 at 09:56:18AM +0100, Martin Maechler wrote:
> >>>>> Ben Bolker <bbolker at gmail.com>
> >>>>>     on Sat, 5 Feb 2011 15:58:09 -0500 writes:
> 
>     >   A bug was recently posted to the R bug database (which
>     > probably would better have been posted as a query here) as
>     > to why this happens:
> 
>     >> print(7.921,digits=2)
>     > [1] 8
>     >> print(7.92,digits=2)
>     > [1] 7.9
> 
[...]

> I had started to delve into this after you've posted the bug
> report. It is clearly a bug(let),
> caused by code that has been in  R  from its very
> beginning, at least in the first source code I've seen in 1994.
> 
> The problem is not related to digits=2,
> but using such a low number of digits shows it more
> dramatically, e.g., also
> 
>  > print(5.9994, digits=4)
>  [1] 5.999
>  > print(5.9994001, digits=4)
>  [1] 6
> 
> Interestingly, the problem seems *not* to be present for
> digits = 1 (only).
> 
> I haven't found time to mathematically "analyze" it for
> determining a correct solution though.
> Note that fixing this bug(let) will probably (very slightly)
> change a huge number of R outputs .. so there is a caveat,
> but nonetheless, we must approach it.
> 
> The responsible code is the C function  scientific()
> in src/main/format.c 

I inspected the source of scientific() and formatReal() (2.13.0, revision
2011-02-08 r54284). Let me point out an example of the difference between
the output of print() and rounding to "digits" significant digits, which
is slightly different from the previous ones, since also the exponent
changes. Namely,

  print(9.991, digits=3)
  [1] 10

while rounding to 3 digits yields 9.99. The reason is in scientific(),
where the situation that rounding increases the exponent is tested using

    /* make sure that alpha is in [1,10) AFTER rounding */

    if (10.0 - alpha < eps*alpha) {
        alpha /= 10.0;
        kp += 1;
    }

Here, eps is determined in formatReal() as

    double eps = pow(10.0, -(double)R_print.digits);

so we have eps = 10^-digits. The above condition on alpha is equivalent to

  alpha > 10.0/(1 + 10^-digits)

For digits=3, this is

  alpha > 9.99000999000999

This bound may be verified as follows

  print(9.9900099900, digits=3)
  [1] 9.99

  print(9.9900099901, digits=3)
  [1] 10

The existing algorithm for choosing the number of digits is designed to
predict the format suitable for all numbers in a vector before the actual
call of sprintf() or snprintf(). For speed, this algorithm should use
the standard double precision, so it is necessarily inaccurate for precision
15 and more and there may be some rare such cases also for smaller
precisions. For smaller precisions, say below 7, the algorithm can be made
more precise. This would change the output in rare cases and mainly for
printing single numbers. If a vector is printed, then the format is typically
not determined by the problematic numbers.

Changing the default behavior may be undesirable for backward compatibility
reasons. If this is the case, then a possible solution is to make the
documentation more precise on this and include pointers to possible
solutions. The functions sprintf(), formatC() and signif() may be used. In
particular, if signif() is used to round the numbers before printing, then we
get the correct output

  print(signif(7.921, digits=2))  
  [1] 7.9

  print(signif(9.9900099901, digits=3))
  [1] 9.99

The current ?print.default contains

  digits: a non-null value for ‘digits’ specifies the minimum number of
          significant digits to be printed in values.

The word "minimum" here means that all numbers in the vector will have
at least the chosen number of digits, but some may have more. I suggest
to add "See 'options' for more detail.".

The current ?options contains

    ‘digits’: controls the number of digits to print when printing
          numeric values.  It is a suggestion only.

I suggest to extend this to

   It is a suggestion only and the actual number of printed digits
   may be smaller, if the relative error of the output number is
   less than 10^-digits. Use 'signif(, digits)' before printing to get
   the exact number of the printed significant digits.

I appreciate to know the opinion of R developers on this.

Petr Savicky.