[Rd] round(x, dig) [was "Development version of R fails tests .."]

Steven Dirkse @d|rk@e @end|ng |rom g@m@@com
Tue Feb 18 05:55:35 CET 2020


Martin,

Yes, this subject of rounding is quite fascinating when one realizes that
we round a binary representation to a value expressed via a decimal
representation, but then store this value as a binary.  It's also very
interesting to note that there seems to be little consensus on what correct
behavior is.  Maybe that's a good place to start.  First some observations:

1. round(x,n) depends on the double x, not on some string value that when
converted to double yields x.  So one cannot insist that
round(55.5555555,6)  returns 55.555556.  The decimal value 55.5555555 is
not exactly representable as a double, so this particular value is one the
round function will never see.  round() will only ever see the closest
double to this value.
2. It is possible to produce "correctly rounded decimal representations" of
doubles.  The C routine sprintf() does this, or at least it is supposed
to.  Of course, we get a string and not a double, but it's a correctly
rounded value.  And if one doesn't believe sprintf() does this, no matter:
there are other routines that do.  To be precise, a "correctly rounded
decimal representation" of x is the closest-to-x base-10 value with n
digits past the decimal point.  In the event of ties, use banker's rounding.
3. There are routines that, given a decimal string like 55.5555, produce a
"correctly rounded double", i.e. the closest double representation and, in
the event of tie, a result computed using banker's rounding.  IMHO the C
strtod() does exactly this.  Again, there are other routines that do if you
don't like strtod.

So as a sort of reference implementation or definition, I would propose
that the result of round(x,n) would be the value you get by first rounding
x to n decimal digits - as a string!! - and then converting this string to
a double.  The most obvious implementation of this definition is not very
fast, but at least it's unambiguous.  And with some tweaks it might not be
so slow after all . . .

If we give up on having a proper definition of what round(x,n) should
actually return, we could easily wind up chasing our tails.

To follow up on the specific example of how to compute

x <- 9.18665
round(x,4)

the definition above implies that round(x,4) is 9.1867.  The double 9.18665
is not exactly representable, and the nearest double is a little higher
than the decimal string.  Since the decimal value is exactly in between the
two closest candidates for rounding to four digits, the direction of this
error when representing the decimal as binary determines the final result -
or, if you like to think only in terms of decimal representations, the
direction of the rounding.

I look forward to hearing your thoughts on this suggested definition.


-Steve

On Sat, Feb 8, 2020 at 12:03 PM Martin Maechler <maechler using stat.math.ethz.ch>
wrote:

> >>>>> Hugh Parsonage
> >>>>>     on Sat, 8 Feb 2020 21:12:43 +1100 writes:
>
>     > The only observation I can make is that the change to
>     > round() was made in r77727 whereas your R-devel appears to
>     > be r77715 (so would not exhibit the fixed behaviour).  My
>     > guess is that there was a perpetual installation failure
>     > after r77715 but that the test folder was still retrieved
>     > and used.
>
>
>     > On Sat, 8 Feb 2020 at 19:27, Berwin A Turlach <
> berwin.turlach using gmail.com> wrote:
>     >>
>     >> G'day all,
>     >>
>     >> I have daily scripts running to install the patched version of the
>     >> current R version and the development version of R on my linux box
>     >> (Ubuntu 18.04.4 LTS).
>     >>
>     >> The last development version that was successfully compiled and
>     >> installed was "R Under development (unstable) (2020-01-25 r77715)"
> on
>     >> 27 January.  Since then the script always fails as a regression test
>     >> seems to fail.  Specifically, in the tests/ subdirectory of my build
>     >> directory I have a file reg-tests-1d.Rout.fail which ends with:
>     >>
>     >> > ## more than half of the above were rounded *down* in R <= 3.6.x
>     >> > ## Some "wrong" test cases from CRAN packages (partly relying on
> wrong R <= 3.6.x behavior)
>     >> > stopifnot(exprs = {
>     >> +     all.equal(round(10.7775, digits=3), 10.778, tolerance =
> 1e-12) # even tol=0, was 10.777
>     >> +     all.equal(round(12345 / 1000,   2), 12.35 , tolerance =
> 1e-12) # even tol=0, was 12.34 in Rd
>     >> +     all.equal(round(9.18665, 4),        9.1866, tolerance =
> 1e-12) # even tol=0, was  9.1867
>     >> + })
>     >> Error: round(10.7775, digits = 3) and 10.778 are not equal:
>
>     >> Mean relative difference: 9.27902e-05
>     >> Execution halted
>     >>
>     >> This happens while the 32bit architecture is installed,  which is a
> bit
>     >> surprising as I get the following results for the last installed
>     >> version of R's development version:
>     >>
>     >> R Under development (unstable) (2020-01-25 r77715) -- "Unsuffered
> Consequences"
>     >> Copyright (C) 2020 The R Foundation for Statistical Computing
>     >> Platform: x86_64-pc-linux-gnu/32 (32-bit)
>     >> [...]
>     >> > round(10.7775, digits=3)
>     >> [1] 10.778
>     >>
>     >> and
>     >>
>     >> R Under development (unstable) (2020-01-25 r77715) -- "Unsuffered
> Consequences"
>     >> Copyright (C) 2020 The R Foundation for Statistical Computing
>     >> Platform: x86_64-pc-linux-gnu/64 (64-bit)
>     >> [...]
>     >> > round(10.7775, digits=3)
>     >> [1] 10.778
>     >>
>     >>
>     >> On the other hand, the R 3.6.2 version, that I mainly use at the
> moment,
>     >> gives the following results:
>     >>
>     >> R version 3.6.2 (2019-12-12) -- "Dark and Stormy Night"
>     >> Copyright (C) 2019 The R Foundation for Statistical Computing
>     >> Platform: x86_64-pc-linux-gnu/32 (32-bit)
>     >> [...]
>     >> > round(10.7775, digits=3)
>     >> [1] 10.777
>     >>
>     >> and
>     >>
>     >> R version 3.6.2 (2019-12-12) -- "Dark and Stormy Night"
>     >> Copyright (C) 2019 The R Foundation for Statistical Computing
>     >> Platform: x86_64-pc-linux-gnu/64 (64-bit)
>     >> [...]
>     >> > round(10.7775, digits=3)
>     >> [1] 10.777
>     >>
>     >>
>     >> So it seems as if the behaviour of round() has changed between R
> 3.6.2
>     >> and the development version.  But I do not understand why this test
> all
>     >> of a sudden failed if the results from the last successfully
> installed
>     >> development version of R suggest that the test should be passed.
>     >>
>     >> Thanks in advance for any insight and tips.
>     >>
>     >> Cheers,
>     >> Berwin
>
> Note that r77727 was the last of a few commits I made related to
> dealing with R's bug report PR#17668:
>   https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17668
>
> which itself triggered an involved dialogue, mostly online,
> visible at the PR's URL above.
>
> It lead me to also write an R package 'round' (in order to
> compare R 3.6.x and later's round() versions, comparing them etc)
> with a (not entirely polished) package vignette
> that explains how rounding to decimal digits is not at all
> trivial and why and how I ended (*) improving R's
> round(x, digits) algorithm in R-devel.
>
> The CRAN version of the package
>     https://cran.r-project.org/package=round
>
>     install.packages("round")
>
> is not quite current, notably its vignette isn't and so I have
> mentioned in the above thread
> ( https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17668#c8 )
> that the latest version of the vignette is also available as
>
>      https://stat.ethz.ch/~maechler/R/Rounding.html
>
> You can install and load the devel version of 'round' by
>
>    remotes::install_gitlab("mmaechler/round")
>    require("round")
>
> and then look a bit at the different versions of round(.)  using
>
>    example(roundX)
>
> i.e. using round::roundX(x, digits, version)
>
> For those who read so far:  I'm really interested in getting
> critical (constructive) feedback and comments about what I've
> written there (in the bugzilla report, and the package vignette).
> It seems almost nobody till now has had much interest and time to delve
> into the somewhat intriguing issues.
>
> Best regards,
> Martin Maechler
> ETH Zurich and R Core team
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>


-- 
Steven Dirkse, Ph.D.
GAMS Development Corp.
office: 202.342.0180

	[[alternative HTML version deleted]]



More information about the R-devel mailing list