[Rd] "+" for character method...

Duncan Murdoch murdoch at stats.uwo.ca
Sat Aug 26 00:09:42 CEST 2006


On 8/25/2006 4:55 PM, Martin Maechler wrote:
>>>>>> "Duncan" == Duncan Murdoch <murdoch at stats.uwo.ca>
>>>>>>     on Fri, 25 Aug 2006 13:18:42 -0400 writes:
> 
>     Duncan> On 8/25/2006 12:31 PM, Martin Maechler wrote:
>     >> This thread remains me of an old recurring (last May!)
>     >> theme which maybe fits well to Friday late afternoon...
>     >> 
>     >> There have been propositions to make "+" work in S (and
>     >> R) like in some other languages, namely for character
>     >> (vectors),
>     >> 
>     >> a + b := paste(a,b, sep="")
>     >> 
>     >> IIRC, when this theme came up last, the one argument
>     >> against it was the penalty of method dispatch that we
>     >> were not willing to pay for something as fundamentally
>     >> speed-important as "+" -- which is a .Primitive in R
>     >> exactly for that reason of efficiency.
>     >> 
>     >> But then, we actually do dispatch for "+" -- internally
>     >> in C code via DispatchGroup() --- but only if we need, so
>     >> not when usual numeric/complex arguments are used.
>     >> 
>     >> I think - but may be wrong - it should be possible to
>     >> also check very fast for two "character" arguments and in
>     >> that case do a fast version of paste(a, b, sep="").
> 
>     Duncan> But for consistency shouldn't this work if only one
>     Duncan> of the args is character, coercing the other to
>     Duncan> character?  E.g. we have
> 
>     >> "2" > 10
>     Duncan> [1] TRUE
> 
> yes.  But see also below
> 
>     >> When this last came up (in May), Brian said that about
>     >> the fact that you could not just simply define
>     >> "+.character"
>     >> 
>     >>>> I would think that the intention was also to positively
>     >>>> discourage messing with the basics of R, as if you were
>     >>>> able to do this erroneous uses would likely not get
>     >>>> caught.
>     >> (
>     >> https://stat.ethz.ch/pipermail/r-help/2006-May/104751.html
>     >> ) and subsequently
>     >> (https://stat.ethz.ch/pipermail/r-help/2006-May/104754.html)
>     >> gave an example for this
>     >> 
>     >>>> 2 + x, for example, where x is not numeric.
> 
>     Duncan> This is a valid concern, but I think the clarity
>     Duncan> obtained by coding paste operations using + is worth
>     Duncan> it.
> 
>     Duncan> For example, the first instance of paste(a, b,
>     Duncan> sep="") I see in the source is
> 
>     Duncan> is.ALL(structure(1:7, names = paste("a",1:7,sep="")))
> 
>     Duncan> in base/demo/is.things.R
> 
>     Duncan> which I find clearer as
> 
>     Duncan> is.ALL(structure(1:7, names = "a" + 1:7))
> 
> 
>     Duncan> But then I'm used to using + for strings from
>     Duncan> Borland's Pascal extensions; to a C-speaker the
>     Duncan> meaning may not be so obvious.
> 
> yes.  I think however if we keep speed and clarity and catching
> user errors all in mind, it would be enough - and better - to
> only dispatch to paste(.,.) when both arguments are character
> (vectors), i.e., the above case needed  
>  "a" + as.character(1:7) or "a" + paste(1:7) or "a" + format(1:7)
> which after all is really more clearer, even more for cases of
>  "1" + 2  which I'd rather want keeping to give errors.
> 
> If  Char + Num  should work like above, then also 
>     Num + Char  should (since after all,  "+" should be commutative 
> 			apart from floating point precision issues).
> 
> and so the internal C code gets a bit more complicated and slightly
> slower..  something we had in mind we should strongly avoid...

I doubt that it would be measurably slower, but I agree that requiring 
both args to be Char could be done in fewer operations than just 
requiring one.

However, I think the consistency argument is stronger.  We have a rule 
that operations on mixed types promote the more restrictive type to the 
less restrictive one, and I don't think we should handle this case 
differently.

So I'd say we should allow all of Char + Num, Num + Char, and Char + 
Char, or, if this costs too much at evaluation time, we shouldn't allow 
any of them.

Duncan Murdoch


> 
> Martin
> 
>     >> I wonder however, if we do this in C, and basically only
>     >> go into the paste-branch when both arguments are
>     >> characters, if we wouldn't get to a nice useful solution
>     >> without a noticable performance penalty.
>     >> 
>     >> This would also solve my other slight related uneasyness
>     >> : Many times in the past, when using paste(..., sep='')
>     >> in function definitions I had wanted this (empty sep) to
>     >> be the default and to have an easier, more readable way
>     >> to achieve the same.
>     >> 
>     >> But then these all are just musings at the end of the
>     >> week...
>     >> 
>     >> Martin Maechler, ETH Zurich
>     >> 
>     >> ______________________________________________
>     >> R-devel at r-project.org mailing list
>     >> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
>     Duncan> ______________________________________________
>     Duncan> R-devel at r-project.org mailing list
>     Duncan> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list