[Rd] base::format adds extraneous whitespace for some inputs

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Thu Jun 20 17:27:23 CEST 2019


>>>>> Sarah Goslee 
>>>>>     on Thu, 20 Jun 2019 09:56:44 -0400 writes:

    > I can reproduce this.
    > It has to do with whether the value rounds down to 9 or up to 10, and
    > thus needs another space, I think. I agree that it shouldn't happen,
    > but at least you can get rid of the space by using trim = TRUE.

    > # rounds to 9 vs 10

    > format(9.95, digits = 2)
    > format(9.96, digits = 2)

    > format(9.95, digits = 2, nsmall = 2)
    > format(9.96, digits = 2, nsmall = 2)

    > format(9.95, digits = 2, nsmall = 2, trim=TRUE)
    > format(9.96, digits = 2, nsmall = 2, trim=TRUE)

    > # rounds to 99 vs 100

    > format(99.94, digits = 3)
    > format(99.95, digits = 3)

    > format(99.94, digits = 3, nsmall = 2)
    > format(99.95, digits = 3, nsmall = 2)

    > format(99.94, digits = 3, nsmall = 2, trim=TRUE)
    > format(99.95, digits = 3, nsmall = 2, trim=TRUE)

Yes, indeed;
I had wanted to reply earlier, but did not get to.

I agree that this is bogous;
I've never encountered it as I've (almost?) never used 'nsmall' consciously.

Interestingly, this behavior has probably existed unchanged for close to R's
full history.  The 'nsmall = *' optional argument (of
format.default() to be precise) was introduced in R 1.3.0   in 2001.

And in my still working version of R 1.3.1, behavior seems
similar (not identical) I think.

You can access the underlying computations using format.info()
from the R level. It calls into the C code which is really used here from the
.Internal(format(...))  C code :

e.g.

> format.info(9.91, 2, 2)
[1] 4 2 0

 ==> result will use 4 characters

> format.info(9.99, 2, 2)
[1] 5 2 0

 ==> result will use 5 characters



-----------------

One more thing:  format() has really been designed (in S, and
inherited for R) to format *several* numbers, often matrices (or
data frames if you must) to be printed and look nicely.

For this (in cases like these, with numbers),
format() must find a common format for all numbers, and that is
the reason the underlying algorithm is quite sophisticated
because it needs to cover many border line cases, notably
deciding on when exponential format is needed, etc etc.


For format()ting simple numbers (i.e. numeric vectors of length *one*),
using  formatC()  (or even sprintf()  is typically faster and easier
to use--for sprintf() you need to know C-standard formatting a bit.


    >> sessionInfo()
    > R version 3.5.3 (2019-03-11)
    > Platform: x86_64-redhat-linux-gnu (64-bit)
    > Running under: Fedora 28 (Workstation Edition)

    ..........

    >> # rounds to 9 vs 10
    >> 
    >> format(9.95, digits = 2)
    > [1] "9.9"
    >> format(9.96, digits = 2)
    > [1] "10"
    >> 
    >> format(9.95, digits = 2, nsmall = 2)
    > [1] "9.95"
    >> format(9.96, digits = 2, nsmall = 2)
    > [1] " 9.96"
    >> 
    >> format(9.95, digits = 2, nsmall = 2, trim=TRUE)
    > [1] "9.95"
    >> format(9.96, digits = 2, nsmall = 2, trim=TRUE)
    > [1] "9.96"
    >> 
    >> # rounds to 99 vs 100
    >> 
    >> format(99.94, digits = 3)
    > [1] "99.9"
    >> format(99.95, digits = 3)
    > [1] "100"
    >> 
    >> format(99.94, digits = 3, nsmall = 2)
    > [1] "99.94"
    >> format(99.95, digits = 3, nsmall = 2)
    > [1] " 99.95"
    >> 
    >> format(99.94, digits = 3, nsmall = 2, trim=TRUE)
    > [1] "99.94"
    >> format(99.95, digits = 3, nsmall = 2, trim=TRUE)
    > [1] "99.95"

    > On Thu, Jun 20, 2019 at 3:19 AM David J. Birke <djbirke using berkeley.edu> wrote:
    >> 
    >> Dear R Core Team,
    >> 
    >> First of all, thank you for your amazing work on developing and
    >> maintaining this wonderful language.
    >> 
    >> I just stumbled upon the following behavior in R version 3.6.0:
    >> 
    >> format(9.91, digits = 2, nsmall = 2)
    >> format(9.99, digits = 2, nsmall = 2)
    >> 
    >> yield "9.91" and " 9.99" with an extraneous whitespace.
    >> 
    >> My expected output for the second command is "9.99".
    >> 
    >> I have not found anything explaining the whitespace in the help files.
    >> Therefore, I am writing to report this behavior as a possible bug.
    >> 
    >> Best wishes,
    >> David
    >> 
    >> ______________________________________________
    >> R-devel using r-project.org mailing list
    >> https://stat.ethz.ch/mailman/listinfo/r-devel



    > -- 
    > Sarah Goslee (she/her)
    > http://www.numberwright.com

    > ______________________________________________
    > R-devel using r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list