[Rd] Improving string concatenation

Hervé Pagès hpages at fredhutch.org
Thu Jun 18 00:55:21 CEST 2015


Hi Bill,

On 06/17/2015 12:36 PM, William Dunlap wrote:
> if '+' and paste don't change their behavior with respect to
> factors but you encourage people to use '+' instead of paste
> then you will run into problems with data.frame columns because
> many people don't notice whether a character-like column is
> character or factor.  With paste() this is not a problem but with '+'
> it is.  I think it is good not to make people worry about this much.
>
> As for the recycling issue, consider calls involving NULL arguments,
>    > f <- function(n)paste0(n, " test", if(n!=1)"s", " failed")
>    > f(1)
>    [1] "1 test failed"
>    > f(0)
>    [1] "0 tests failed"
> If paste0 followed the same recycling rules as "+" then f(1) would return
> character(0).  There is a fair bit of code like that on CRAN.

OTOH a very common use case is to use paste (or paste0) to add a given
prefix (or suffix) to a bunch of strings:

   paste0("ID", x)  # buggy! (won't do the right thing if length(x) is 0)

This is like "adding" something to 'x' so it's conceptually no different
from doing:

   x + 5

which does the right thing when 'x' is a numeric(0).

Anyway, I don't think anybody suggested to change the recycling rules
of paste() or paste0() (which would of course break some existing code
that relies on it, but that's a very generic statement right?), only
to adopt the recycling rules of `+` and other binary arithmetic and
comparison operators if `+` was used to concatenate strings.

Cheers,
H.

>
> Consider using sprintf() to get the sort of recycling rules that "+" uses
>    > sprintf("%s is %d", c("One","Two"), numeric(0))
>    character(0)
>    > sprintf("%s is %d", c("One","Two"), 17)
>    [1] "One is 17" "Two is 17"
>    > sprintf("%s is %d", c("One","Two"), 26:27)
>    [1] "One is 26" "Two is 27"
>
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Wed, Jun 17, 2015 at 9:56 AM, Gábor Csárdi <csardi.gabor at gmail.com>
> wrote:
>
>> On Wed, Jun 17, 2015 at 12:45 PM, William Dunlap <wdunlap at tibco.com>
>> wrote:
>>>> ... adding the ability to concat
>>>> strings with '+' would be a relatively simple addition (no pun intended)
>>> to
>>>> the code base I believe. With a lot of other languages supporting this
>>> kind
>>>> of concatenation, this is what surprised me most when first learning R.
>>>
>>> Wow!  R has a lot of surprising features and I would have thought
>>> this would be quite a way down the list.
>>
>> Well, it is hard to guess what users and people in general find
>> surprising. As '+' is used for string concatenation in essentially all
>> major scripting (and many other) languages, personally I am not
>> surprised that this is surprising for people. :)
>>
>>> How would this new '+' deal with factors, as paste does or as the current
>>> '+'
>>> does?
>>
>> The same as before. It would not change the behavior for other
>> classes, only basic characters.
>>
>>> Would number+string and string+number cause errors (as in current
>>> '+' in R and python) or coerce both to strings (as in current R:paste and
>>> in perl's '+').
>>
>> Would cause errors, exactly as it does right now.
>>
>>> Having '+' work on all types of data can let improperly imported data
>>> get further into the system before triggering an error.
>>
>> Nobody is asking for this. Only characters, not all types of data.
>>
>>> I see lots of
>>> errors
>>> reported on this list that are due to read.table interpreting text as
>>> character
>>> strings instead of the numbers that the user expected.  Detecting that
>>> error as early as possible is good.
>>
>> Isn't that a problem with read.table then? Detecting it there would be
>> the earliest possible, no?
>>
>> Gabor
>>
>> [...]
>>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the R-devel mailing list