[Rd] Improving string concatenation

MacQueen, Don macqueen1 at llnl.gov
Thu Jun 18 19:34:50 CEST 2015


At the risk of unnecessarily (annoyingly?) prolonging a conversation that
has died down...

I don't think I've seen the sep or collapse arguments to paste mentioned
as aspects to consider. I don't see any way in which this version of '+'
could offer those arguments. Hence I would consider this version of '+' to
be a just convenience function, i.e., a function that, for convenience,
implements a special case of a more general function. It would not be a
different type of concatenation, nor would it improve the current methods
of string concatenation.

There is precedent in R for convenience functions. Indeed, I consider
paste0 to be a convenience function for paste with sep=''. read.csv and
several others are convenience functions that implement special cases of
read.table. 

Viewed that way, I see no intrinsic conceptual impediment to introducing a
version of '+' that does string concatenation. Of course, those who did
the work would have to decide how it would handle recycling and other
issues that have been raised.

However, whether or not it would be a good idea to do so, or worth the
effort, is not clear.

I've never felt that ... it would be nice if R did something the same way
as language X ... is by itself a strong argument for introducing a new
function or capability. Speaking as a long-time user, I wouldn't ask R
core to spend time on it. Would I use it if it were available? Possibly
over time I might migrate toward using it in simple situations.

-Don

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 6/17/15, 12:36 PM, "R-devel on behalf of William Dunlap"
<r-devel-bounces at r-project.org on behalf of wdunlap at tibco.com> wrote:

>if '+' and paste don't change their behavior with respect to
>factors but you encourage people to use '+' instead of paste
>then you will run into problems with data.frame columns because
>many people don't notice whether a character-like column is
>character or factor.  With paste() this is not a problem but with '+'
>it is.  I think it is good not to make people worry about this much.
>
>As for the recycling issue, consider calls involving NULL arguments,
>  > f <- function(n)paste0(n, " test", if(n!=1)"s", " failed")
>  > f(1)
>  [1] "1 test failed"
>  > f(0)
>  [1] "0 tests failed"
>If paste0 followed the same recycling rules as "+" then f(1) would return
>character(0).  There is a fair bit of code like that on CRAN.
>
>Consider using sprintf() to get the sort of recycling rules that "+" uses
>  > sprintf("%s is %d", c("One","Two"), numeric(0))
>  character(0)
>  > sprintf("%s is %d", c("One","Two"), 17)
>  [1] "One is 17" "Two is 17"
>  > sprintf("%s is %d", c("One","Two"), 26:27)
>  [1] "One is 26" "Two is 27"
>
>
>
>Bill Dunlap
>TIBCO Software
>wdunlap tibco.com
>
>On Wed, Jun 17, 2015 at 9:56 AM, Gábor Csárdi <csardi.gabor at gmail.com>
>wrote:
>
>> On Wed, Jun 17, 2015 at 12:45 PM, William Dunlap <wdunlap at tibco.com>
>> wrote:
>> >> ... adding the ability to concat
>> >> strings with '+' would be a relatively simple addition (no pun
>>intended)
>> > to
>> >> the code base I believe. With a lot of other languages supporting
>>this
>> > kind
>> >> of concatenation, this is what surprised me most when first learning
>>R.
>> >
>> > Wow!  R has a lot of surprising features and I would have thought
>> > this would be quite a way down the list.
>>
>> Well, it is hard to guess what users and people in general find
>> surprising. As '+' is used for string concatenation in essentially all
>> major scripting (and many other) languages, personally I am not
>> surprised that this is surprising for people. :)
>>
>> > How would this new '+' deal with factors, as paste does or as the
>>current
>> > '+'
>> > does?
>>
>> The same as before. It would not change the behavior for other
>> classes, only basic characters.
>>
>> > Would number+string and string+number cause errors (as in current
>> > '+' in R and python) or coerce both to strings (as in current R:paste
>>and
>> > in perl's '+').
>>
>> Would cause errors, exactly as it does right now.
>>
>> > Having '+' work on all types of data can let improperly imported data
>> > get further into the system before triggering an error.
>>
>> Nobody is asking for this. Only characters, not all types of data.
>>
>> > I see lots of
>> > errors
>> > reported on this list that are due to read.table interpreting text as
>> > character
>> > strings instead of the numbers that the user expected.  Detecting that
>> > error as early as possible is good.
>>
>> Isn't that a problem with read.table then? Detecting it there would be
>> the earliest possible, no?
>>
>> Gabor
>>
>> [...]
>>
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-devel at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list