[Rd] Feature Request: Allow Underscore Separated Numbers

Jim Hester j@me@@|@he@ter @end|ng |rom gm@||@com
Fri Jul 15 21:25:48 CEST 2022


I think keeping it simple and less restrictive is the best approach,
for ease of implementation, limiting future maintenance, and so users
have the flexibility to format these however they wish. So I would
probably lean towards allowing multiple delimiters anywhere (including
trailing) or possibly just between digits.

On Fri, Jul 15, 2022 at 2:26 PM Duncan Murdoch <murdoch.duncan using gmail.com> wrote:
>
> Thanks for posting that list.  The Python document is the only one I've
> read so far; it has a really nice summary
> (https://peps.python.org/pep-0515/#prior-art) of the differences in
> implementations among 10 languages.  Which choice would you recommend,
> and why?
>
>   - I think Ivan's quick solution doesn't quite match any of them.
>   - C, Fortran and C++ have special support in R, but none of them use
> underscore separators.
>   - C++ does support separators, but uses "'", not "_", and some ancient
> forms of Fortran ignore embedded spaces.
>
> Duncan Murdoch
>
> On 15/07/2022 1:58 p.m., Jim Hester wrote:
> > Allowing underscores in numeric literals is becoming a very common
> > feature in computing languages. All of these languages (and more) now
> > support it
> >
> > python: https://peps.python.org/pep-0515/
> > javascript: https://v8.dev/features/numeric-separators
> > julia: https://docs.julialang.org/en/v1/manual/integers-and-floating-point-numbers/#Floating-Point-Numbers
> > java: https://docs.oracle.com/javase/7/docs/technotes/guides/language/underscores-literals.html#:~:text=In%20Java%20SE%207%20and,the%20readability%20of%20your%20code.
> > ruby: https://docs.ruby-lang.org/en/2.0.0/syntax/literals_rdoc.html#label-Numbers
> > perl: https://perldoc.perl.org/perldata#Scalar-value-constructors
> > rust: https://doc.rust-lang.org/rust-by-example/primitives/literals.html
> > C#: https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/builtin-types/floating-point-numeric-types#real-literals
> > go: https://go.dev/ref/spec#Integer_literals
> >
> > Its use in this context also dates back to at least Ada 83
> > (http://archive.adaic.com/standards/83lrm/html/lrm-02-04.html#:~:text=A%20decimal%20literal%20is%20a,the%20base%20is%20implicitly%20ten).&text=An%20underline%20character%20inserted%20between,value%20of%20this%20numeric%20literal.)
> >
> > Many other communities see the benefit of this feature, I think R's
> > community would benefit from it as well.
> >
> > On Fri, Jul 15, 2022 at 1:22 PM Ivan Krylov <krylov.r00t using gmail.com> wrote:
> >>
> >> On Fri, 15 Jul 2022 11:25:32 -0400
> >> <avi.e.gross using gmail.com> wrote:
> >>
> >>> R normally delays evaluation so chunks of code are handed over
> >>> untouched to functions that often play with the text directly without
> >>> evaluating it until, perhaps, much later.
> >>
> >> Do they play with the text, or with the syntax tree after it went
> >> through the parser? While it's true that R saves the source text of the
> >> functions for ease of debugging, it's not guaranteed that a given
> >> object will have source references, and typical NSE functions operate
> >> on language objects which are tree-like structures containing R values,
> >> not source text.
> >>
> >> You are, of course, right that any changes to the syntax of the
> >> language must be carefully considered, but if anyone wants to play with
> >> this idea, it can be implemented in a very simple manner:
> >>
> >> --- src/main/gram.y     (revision 82598)
> >> +++ src/main/gram.y     (working copy)
> >> @@ -2526,7 +2526,7 @@
> >>       YYTEXT_PUSH(c, yyp);
> >>       /* We don't care about other than ASCII digits */
> >>       while (isdigit(c = xxgetc()) || c == '.' || c == 'e' || c == 'E'
> >> -          || c == 'x' || c == 'X' || c == 'L')
> >> +          || c == 'x' || c == 'X' || c == 'L' || c == '_')
> >>       {
> >>          count++;
> >>          if (c == 'L') /* must be at the end.  Won't allow 1Le3 (at present). */
> >> @@ -2533,6 +2533,9 @@
> >>          {   YYTEXT_PUSH(c, yyp);
> >>              break;
> >>          }
> >> +       if (c == '_') { /* allow an underscore anywhere inside the literal */
> >> +           continue;
> >> +       }
> >>
> >>          if (c == 'x' || c == 'X') {
> >>              if (count > 2 || last != '0') break;  /* 0x must be first */
> >>
> >> To an NSE function, the underscored literals are indistinguishable from
> >> normal ones, because they don't see the literals:
> >>
> >> stopifnot(all.equal(\() 1000000, \() 1_000_000))
> >> f <- function(x, y) stopifnot(all.equal(substitute(x), substitute(y)))
> >> f(1e6, 1_000_000)
> >>
> >> Although it's true that the source references change as a result:
> >>
> >> lapply(
> >>   list(\() 1000000, \() 1_000_000),
> >>   \(.) as.character(getSrcref(.))
> >> )
> >> # [[1]]
> >> # [1] "\\() 1000000"
> >> #
> >> # [[2]]
> >> # [1] "\\() 1_000_000"
> >>
> >> This patch is somewhat simplistic: it allows both multiple underscores
> >> in succession and underscores at the end of the number literal. Perl
> >> does so too, but with a warning:
> >>
> >> perl -wE'say "true" if 1__000_ == 1000'
> >> # Misplaced _ in number at -e line 1.
> >> # Misplaced _ in number at -e line 1.
> >> # true
> >>
> >> --
> >> Best regards,
> >> Ivan
> >>
> >> ______________________________________________
> >> R-devel using r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> > ______________________________________________
> > R-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>



More information about the R-devel mailing list