[Rd] [WISH / PATCH] possibility to split string literals across multiple lines

Joris Meys jorismeys at gmail.com
Wed Jun 14 14:18:15 CEST 2017


Mark, that's actually a fair statement, although your extra operator
doesn't cause construction at parse time. You still call paste0(), but just
add an extra layer on top of it.

I also doubt that even in gigantic loops the benefit is going to be
significant. Take following example:

atestfun <- function(x){
  y <- paste0("a very long",
         "string for testing")
  grep(x, y)
}
atestfun2 <- function(x){
  y <- "a very long
string for testing"
  grep(x,y)
}
cfun <- cmpfun(atestfun)
cfun2 <- cmpfun(atestfun2)

require(rbenchmark)
benchmark(atestfun("a"),
          atestfun2("a"),
          cfun("a"),
          cfun2("a"),
          replications = 100000)

Which gives after 100,000 replications:

            test replications elapsed relative
1  atestfun("a")       100000    0.83    1.339
2 atestfun2("a")       100000    0.62    1.000
3      cfun("a")       100000    0.81    1.306
4     cfun2("a")       100000    0.62    1.000

The patch can in principle make similar code marginally faster, but I'm not
convinced the patch is going to make any real difference except for in some
very specific and exotic cases. Even more, calling a function like the
examples inside the loop is the only way I can come up with where this
might be a problem. If you just construct the string inside the loop,
there's two possibilities:

- the string does not need to change, and then you better construct it
outside of the loop
- the string does need to change, and then you need paste() or paste0()
anyway

I'm not against incorporating the patch, as it would eliminate a few
keystrokes. It's a neat idea, but I don't expect any other noticeable
advantage from it.

my humble 2 cents
Cheers
Joris

On Wed, Jun 14, 2017 at 2:00 PM, Mark van der Loo <mark.vanderloo at gmail.com>
wrote:

> Having some line-breaking character for string literals would have benefits
> as string literals can then be constructed parse-time rather than run-time.
> I have run into this myself a few times as well. One way to at least
> emulate something like that is the following.
>
> `%+%` <- function(x,y) paste0(x,y)
>
> "hello" %+%
>   " pretty" %+%
>   " world"
>
>
> -Mark
>
>
>
> Op wo 14 jun. 2017 om 13:53 schreef Andreas Kersting <r-devel at akersting.de
> >:
>
> > On Wed, 14 Jun 2017 06:12:09 -0500, Duncan Murdoch <
> > murdoch.duncan at gmail.com> wrote:
> >
> > > On 14/06/2017 5:58 AM, Andreas Kersting wrote:
> > > > Hi,
> > > >
> > > > I would really like to have a way to split long string literals
> across
> > > > multiple lines in R.
> > >
> > > I don't understand why you require the string to be a literal.  Why not
> > > construct the long string in an expression like
> > >
> > >   paste0("aaa",
> > >          "bbb")
> > >
> > > ?  Surely the execution time of the paste0 call is negligible.
> > >
> > > Duncan Murdoch
> >
> > Actually "execution time" is precisely one of the reasons why I would
> like
> > to see this feature as - depending on the context (e.g. in a tight loop)
> -
> > the execution time of paste0 (or probably also glue, thanks Gabor) is not
> > necessarily insignificant.
> >
> > The other reason is style: I think it is cleaner if we can construct such
> > a long string literal without the need for a function call.
> >
> > Andreas
> >
> > > >
> > > > Currently, if a string literal spans multiple lines, there is no way
> to
> > > > inhibit the introduction of newline characters:
> > > >
> > > >  > "aaa
> > > > + bbb"
> > > > [1] "aaa\nbbb"
> > > >
> > > >
> > > > If a line ends with a backslash, it is just ignored:
> > > >
> > > >  > "aaa\
> > > > + bbb"
> > > > [1] "aaa\nbbb"
> > > >
> > > >
> > > > We could use this fact to implement string splitting in a fairly
> > > > backward-compatible way, since currently such trailing backslashes
> > > > should hardly be used as they do not have any effect. The attached
> > patch
> > > > makes the parser ignore a newline character directly following a
> > backslash:
> > > >
> > > >  > "aaa\
> > > > + bbb"
> > > > [1] "aaabbb"
> > > >
> > > >
> > > > I personally would also prefer if leading blanks (spaces and tabs) in
> > > > the second line are ignored to allow for proper indentation:
> > > >
> > > >  >   "aaa \
> > > > +    bbb"
> > > > [1] "aaa bbb"
> > > >
> > > >  >   "aaa\
> > > > +    \ bbb"
> > > > [1] "aaa bbb"
> > > >
> > > > This is also implemented by this patch.
> > > >
> > > >
> > > > An alternative approach could be to have something like
> > > >
> > > > ("aaa "
> > > > "bbb")
> > > >
> > > > or
> > > >
> > > > ("aaa ",
> > > > "bbb")
> > > >
> > > > be interpreted as "aaa bbb".
> > > >
> > > > I don't know the ins and outs of the parser of R (hence: please very
> > > > carefully review the attached patch), but I guess this would be more
> > > > work to implement!?
> > > >
> > > >
> > > > What do you think? Is there anybody else who is missing this feature
> in
> > > > the first place?
> > > >
> > > > Regards,
> > > > Andreas
> > > >
> > > >
> > > >
> > > > ______________________________________________
> > > > R-devel at r-project.org mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > > >
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics

tel :  +32 (0)9 264 61 79
Joris.Meys at Ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

	[[alternative HTML version deleted]]



More information about the R-devel mailing list