[Rd] "+" for character method...

Gabor Grothendieck ggrothendieck at gmail.com
Sat Aug 26 17:43:02 CEST 2006


There are several problems with %+% :

- %whatever% should be open for use by the user and if R starts
  taking them over they won't be

- %+% is ugly

- %+% is not consistent with other languages (the C-based syntax
  of R is supposed to leverage off one's knowledge of other languages)

Personally I would prefer status quo, + or paste0 to %+% .

On 8/26/06, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
> On 8/26/2006 10:26 AM, John Chambers wrote:
> > Well, two comments, in two non-compatible directions.
> >
> > 1.  I have to say that I find the idea of using "+" to paste character
> > strings together aesthetically ugly.
> >
> > IMO, one thing that makes functional object-based languages attractive
> > is that the generic function retains a consistent _function_, that is,
> > purpose and meaning, of which the methods are implementations.
> >
> > It escapes me totally why I should think of pasting strings as addition
> > in the mathematical or intuitive sense (as Brian points out re
> > commutativity, it fails a number of axiomatic properties).  And if so,
> > what about "-", "*",  "/" and so on?  The mind boggles.
>
> Assuming that your "totally" is literally true:
>
> Strings don't form a commutative group under concatenation, but the
> operation is associative, and there's a zero element "".  This makes
> them a monoid or unitary semigroup.  The natural numbers (including
> zero) are another example of a monoid under addition.  It's not that
> weird to have addition defined without negatives.
>
> Concatenation seems to me to be the most natural interpretation of
> addition for strings.
>
> According to Wikipedia, the "+" operator is used for concatenation in
> BASIC, Pascal, Delphi, Javascript, Java, Python, C++ and Ruby.  These
> are probably the most commonly used modern languages other than C (which
> has no concatenation operator) or Fortran (which I just discovered today
> uses "//").
>
> Other possibilities on the Wikipedia page that don't conflict with
> something else in R are:
>
> Visual Basic and VHDL use the "&" sign.
>
> Standard SQL, PL/I, and Maple from version 6 uses double pipe signs ("||").
>
> OCaml uses "^".
>
> So it seems to me that defining addition of strings to be concatenation
> is a reasonably widespread convention.
>
> I don't think there are widespread conventions for subtraction,
> multiplication or division of strings, so I can't see any argument for
> implementing them.
>
> > Its excuse presumably is to save typing, but I would favor using some
> > %thing% operator at the cost of a couple of extra key strokes.
>
> I think consistency with other common languages is a stronger reason.
> Other than that, I'd be perfectly happy with %+%.
>
> Duncan Murdoch
>
>
> >
> > 2.  Having said that,  it's a reasonable hope that efficiency of
> > dispatch will not be a serious problem.  There are a bunch of fixes, for
> > semantic correctness and efficiency, nearly ready to commit (the
> > Bioconductor folks have been doing some valuable testing).  These should
> > help, and more important perhaps it's fairly easy to see how dispatch in
> > this form can be tuned for performance if necessary.
> >
> > John
> >
> > Bill Dunlap wrote:
> >>>>     >> There have been propositions to make "+" work in S (and
> >>>>     >> R) like in some other languages, namely for character
> >>>>     >> (vectors),
> >>>>     >>
> >>>>     >> a + b := paste(a,b, sep="")
> >>>> ...
> >>>> yes.  I think however if we keep speed and clarity and catching
> >>>> user errors all in mind, it would be enough - and better - to
> >>>> only dispatch to paste(.,.) when both arguments are character
> >>>> (vectors), i.e., the above case needed
> >>>>  "a" + as.character(1:7) or "a" + paste(1:7) or "a" + format(1:7)
> >>>> which after all is really more clearer, even more for cases of
> >>>>  "1" + 2  which I'd rather want keeping to give errors.
> >>>>
> >>>> If  Char + Num  should work like above, then also
> >>>>     Num + Char  should (since after all,  "+" should be commutative
> >>>>                    apart from floating point precision issues).
> >>>>
> >>>> and so the internal C code gets a bit more complicated and slightly
> >>>> slower..  something we had in mind we should strongly avoid...
> >>>>
> >>> I doubt that it would be measurably slower, but I agree that requiring
> >>> both args to be Char could be done in fewer operations than just
> >>> requiring one.
> >>>
> >>> However, I think the consistency argument is stronger.  We have a rule
> >>> that operations on mixed types promote the more restrictive type to the
> >>> less restrictive one, and I don't think we should handle this case
> >>> differently.
> >>>
> >>> So I'd say we should allow all of Char + Num, Num + Char, and Char +
> >>> Char, or, if this costs too much at evaluation time, we shouldn't allow
> >>> any of them.
> >>>
> >> Currently doing arithmetic on mixed class data.frames
> >> produces useful warnings and errors.  E.g.,
> >>
> >>   > z <- data.frame(Factor=factor(c("Lo","Med","High")),
> >>                   Char=letters[1:3],
> >>                   Num1=exp(0:2),
> >>                   Num2=(1:3)*pi,
> >>                   stringsAsFactors=FALSE)
> >>   > z+1
> >>   Error in FUN(left, right) : non-numeric argument to binary operator
> >>   In addition: Warning message:
> >>   + not meaningful for factors in: Ops.factor(left, right)
> >>   > z[,-2] + 1
> >>     Factor     Num1      Num2
> >>   1     NA 2.000000  4.141593
> >>   2     NA 3.718282  7.283185
> >>   3     NA 8.389056 10.424778
> >>   Warning message:
> >>   + not meaningful for factors in: Ops.factor(left, right)
> >>
> >> If we made + do paste(sep="") for character+number then
> >> we would lose the messages and let garbage flow further
> >> down the pipe.
> >>
> >> Should factor data be treated as character data in this
> >> case (e.g., pasting to the levels)?  That would be weird,
> >> but many users confound character and factor data when
> >> they are buried in data.frames.
> >>
> >> ----------------------------------------------------------------------------
> >> Bill Dunlap
> >> Insightful Corporation
> >> bill at insightful dot com
> >> 360-428-8146
> >>
> >>  "All statements in this message represent the opinions of the author and do
> >>  not necessarily reflect Insightful Corporation policy or position."
> >>
> >> ______________________________________________
> >> R-devel at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
> >>
> >
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



More information about the R-devel mailing list