[Rd] string concatenation operator (revisited)

Avi Gross @v|gro@@ @end|ng |rom ver|zon@net
Sun Dec 5 03:26:05 CET 2021


Grant,

One nit to consider is that the default behavior of pasteo() to include a space as a separator would not be a perfect choice for the usual meaning of plus. 

I would prefer a+b to be "helloworld" in your example and to get what you say would be 

a + " " + b

Which I assume would put in a space where you want it and not where you don't.

As I am sure you have been told, you already can make an operator like this:

`%+%` <- function(x, y) paste0(x, y)

And then use:

a %+% b

And to do it this way, you might have two such functions where %+% does NOT add a space but the odd version with a space in it, % +% or %++% does add a space!

`%+%` <- function(x, y) paste0(x, y, sep="")
`%++%` <- function(x, y) paste0(x, " ",  y)
`% +%` <- function(x, y) paste0(x, " ",  y)

Now testing it with:

a = "hello"; b = "world" # NOTE I removed the trailing space you had in "a".

> a %+% b
[1] "helloworld"
> a %++% b
[1] "hello world"
> a % +% b
[1] "hello world"

It also seems to work with multiple units mixed in a row as shown below:

> a %+% b % +% a %++% b
[1] "helloworld hello world"

And it sort of works with vectors of strings or numbers using string concatenation:

> a <- letters[1:3]
> b <- seq(from=101, to = 301, by = 100)
> a %+% b %+% a
[1] "a101a" "b201b" "c301c"

But are you asking for a naked "+" sign to be vectorized like that?

And what if someone accidentally types something like:

a = "text"
a = a + 1

The addition now looks like adding an integer to a text string. In many languages, like PERL, this results in implicated conversion to make "text1" the result. My work-around does that:

> a = a %+% 1
> a
[1] "text1"

BUT what you are asking for is for R to do normal addition if a and b are both numeric and presumably do (as languages like Python do) text concatenation when they are both text. What do you suggest happen if one is numeric and the other is text or perhaps some arbitrary data type? 

I checked to see what Python version 3.9 does:

>>> 5 + 4
9
>>> "5" + "4"
'54'
>>> "5" + 4
Traceback (most recent call last):
  File "<pyshell#6>", line 1, in <module>
    "5" + 4
TypeError: can only concatenate str (not "int") to str

It is clear it does not normally support such mixed methods, albeit I can probably easily create an object sub-class where I create a dunder method that perhaps checks if one of the two things being added can be coerced into a string or into a number as needed to convert so the two types match.

But this is about R.

As others have said, the underlying early philosophy of R being created as a language did not head the same way as some other languages and R is mainly not the same kind of object-oriented as some others and thus some things are not trivially done but can be done using other ways like the %+% technique above.

But R also allows weird things like this: 
# VERY CAREFULLY as overwriting "+" means you cannot use it in your other ...
# So not a suggested idea but if done you must preserve the original meaning of plus elsewhere like I do.

flexible_plus <- function(first, second) {
  if (all(is.numeric(first), is.numeric(second))) return(first + second)
  if (all(is.character(first), is.character(second))) return(paste0(first, second))
  # If you reach here, there is an error
  print("ERROR: both arguments must be numeric or both character")
  return(NULL)
}

Now define things carefully to use something like the function flexible_plus I created becoming the MEANING of a naked plus sign.  But note it will now be used in other ways and places in any code that does addition so it is not an ideal solution. It does sort of work, FWIW.

`%+++%` <- `+`
`+` <- flexible_plus

Finally some testing:

> 5 %+++% 3
[1] 8
> flexible_plus(5, 3)
[1] 8
> 5 + 3
[1] 8
> "hello" + "world"
[1] "helloworld"
> "hello" + 5
[1] "ERROR: both arguments must be numeric or both character"
NULL

It does seem to do approximately what I said it would do but also does some vectorized things as well as long as all are the same type:

> c(1,2,3) + 4
[1] 5 6 7
> c(1,2,3) + c(4,5,6)
[1] 5 7 9
> c("word1", "word2", "word3") + "more"
[1] "word1more" "word2more" "word3more"
> c("word1", "word2", "word3") + c("more", "snore")
[1] "word1more"  "word2snore" "word3more"

Again, the above code is for illustration purposes only. I would be beyond shocked if the above did not break something somewhere and it certainly is not as efficient as the built-in adder. As an exercise, it looks reasonable. LOL!


-----Original Message-----
From: R-devel <r-devel-bounces using r-project.org> On Behalf Of Grant McDermott
Sent: Saturday, December 4, 2021 5:37 PM
To: r-devel using r-project.org
Subject: [Rd] string concatenation operator (revisited)

Hi all,

I wonder if the R Core team might reconsider an old feature request, as detailed in this 2005 thread: https://stat.ethz.ch/pipermail/r-help/2005-February/thread.html#66698

The TL;DR version is base R support for a `+.character` method. This would essentially provide a shortcut to `paste​0`, in much the same way that `\(x)` now provides a shortcut to `function(x)`.

> a = "hello "; b = "world"
> a + b
> [1] "hello world"

I appreciate some of the original concerns raised against a native "string1 + string2" implementation. The above thread also provides several use-at-your-own-risk workarounds. But sixteen years is a long time in software development and R now stands as something of an exception on this score. Python, Julia, Stata, and SQL (among various others) all support native string concatenation/interpolation using binary/arithmetic operators. It's been a surprising source of frustration for students in some of the classes I teach, particularly those coming from another language.

Many thanks for considering.

PS. I hope I didn't miss any additional discussion of this issue beyond the original 2005 thread. My search efforts didn't turn anything else up, except this popular Stackoverflow question: https://stackoverflow.com/questions/4730551/making-a-string-concatenation-operator-in-r

Grant McDermott
Assistant Professor
Department of Economics
University of Oregon
www.grantmcdermott.com


	[[alternative HTML version deleted]]

______________________________________________
R-devel using r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list