[Rd] "+" for character method...

Duncan Murdoch murdoch at stats.uwo.ca
Sat Aug 26 02:01:07 CEST 2006


On 8/25/2006 6:52 PM, Bill Dunlap wrote:
>>>     >> There have been propositions to make "+" work in S (and
>>>     >> R) like in some other languages, namely for character
>>>     >> (vectors),
>>>     >>
>>>     >> a + b := paste(a,b, sep="")
>>> ...
>>> yes.  I think however if we keep speed and clarity and catching
>>> user errors all in mind, it would be enough - and better - to
>>> only dispatch to paste(.,.) when both arguments are character
>>> (vectors), i.e., the above case needed
>>>  "a" + as.character(1:7) or "a" + paste(1:7) or "a" + format(1:7)
>>> which after all is really more clearer, even more for cases of
>>>  "1" + 2  which I'd rather want keeping to give errors.
>>>
>>> If  Char + Num  should work like above, then also
>>>     Num + Char  should (since after all,  "+" should be commutative
>>> 			apart from floating point precision issues).
>>>
>>> and so the internal C code gets a bit more complicated and slightly
>>> slower..  something we had in mind we should strongly avoid...
>> I doubt that it would be measurably slower, but I agree that requiring
>> both args to be Char could be done in fewer operations than just
>> requiring one.
>>
>> However, I think the consistency argument is stronger.  We have a rule
>> that operations on mixed types promote the more restrictive type to the
>> less restrictive one, and I don't think we should handle this case
>> differently.
>>
>> So I'd say we should allow all of Char + Num, Num + Char, and Char +
>> Char, or, if this costs too much at evaluation time, we shouldn't allow
>> any of them.
> 
> Currently doing arithmetic on mixed class data.frames
> produces useful warnings and errors.  E.g.,
> 
>   > z <- data.frame(Factor=factor(c("Lo","Med","High")),
>                   Char=letters[1:3],
>                   Num1=exp(0:2),
>                   Num2=(1:3)*pi,
>                   stringsAsFactors=FALSE)
>   > z+1
>   Error in FUN(left, right) : non-numeric argument to binary operator
>   In addition: Warning message:
>   + not meaningful for factors in: Ops.factor(left, right)
>   > z[,-2] + 1
>     Factor     Num1      Num2
>   1     NA 2.000000  4.141593
>   2     NA 3.718282  7.283185
>   3     NA 8.389056 10.424778
>   Warning message:
>   + not meaningful for factors in: Ops.factor(left, right)
> 
> If we made + do paste(sep="") for character+number then
> we would lose the messages and let garbage flow further
> down the pipe.

Yes, I agree, that's a negative.  But it is consistent with what we do 
elsewhere, and consistency is a good thing:

 > z > 1
   Factor Char  Num1 Num2
1     NA TRUE FALSE TRUE
2     NA TRUE  TRUE TRUE
3     NA TRUE  TRUE TRUE
Warning message:
 > not meaningful for factors in: Ops.factor(left, right)

We get the warning for the factor column, but not the character column.

But is it really common to add values to a data.frame?  Are we going to 
protect anyone from an error they would really make?

> Should factor data be treated as character data in this
> case (e.g., pasting to the levels)?  That would be weird,
> but many users confound character and factor data when
> they are buried in data.frames.

I'd be happy to continue to have the warning in that case.  paste() is 
pretty flexible, so there would be a lot of cases where paste(x, y, 
sep="") gave a result but x+y gave a warning or error.

Duncan Murdoch

> 
> ----------------------------------------------------------------------------
> Bill Dunlap
> Insightful Corporation
> bill at insightful dot com
> 360-428-8146
> 
>  "All statements in this message represent the opinions of the author and do
>  not necessarily reflect Insightful Corporation policy or position."
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list