[Rd] "+" for character method...

Latchezar Dimitrov ldimitro at wfubmc.edu
Sat Aug 26 20:45:52 CEST 2006


Hello,

I too have two comments here - please see between the lines bellow. I
believe they go to the same direction though :-)

> -----Original Message-----
> From: r-devel-bounces at r-project.org 
> [mailto:r-devel-bounces at r-project.org] On Behalf Of John Chambers
> Sent: Saturday, August 26, 2006 10:26 AM
> To: Bill Dunlap
> Cc: R-devel at r-project.org; Duncan Murdoch; Martin Maechler
> Subject: Re: [Rd] "+" for character method...
> 
> Well, two comments, in two non-compatible directions.
> 
> 1.  I have to say that I find the idea of using "+" to paste 
> character strings together aesthetically ugly.
> 
> IMO, one thing that makes functional object-based languages 
> attractive is that the generic function retains a consistent 
> _function_, that is, purpose and meaning, of which the 
> methods are implementations.
> 
> It escapes me totally why I should think of pasting strings 
> as addition in the mathematical or intuitive sense (as Brian 
> points out re commutativity, it fails a number of axiomatic 
> properties).  And if so, what about "-", "*",  "/" and so on? 
>  The mind boggles.

I came up with algebraic thoughts immediately after that point of your
e-mail however I kept reading and saw Gabor's and Duncan's comments -
this for priorities issues only :-)  

1. "-", "*","/" definitions (assuming one knows what "+" means :-)

"-" - is commonly defined by the (unique) solution of a+x=b
("handedness" could be taken into account)

"*" - it does not necessarily have to be homogeneous (the same type)
operation - vector spaces

"/" - see "-" above

2. Now I totally agree with commutativity issue and would like make
another suggestion inspired (if not a logical consequence of) by 1.
above

It is common to consider non-commutative operation as "*" multiplication
so that is my suggestion if you want to keep R "math" spirit. If you
want to pursue VB or the kin I then don't care.

Then I'd like to have string "^" (string exponentiation operation that
is) - obviously (string,integer) w/o any confusion AFAIR.

"Root" I'll leave for higher level string algebra :-)))

Regards,
Latchezar Dimitrov

PS. Character string semi-group is well know in Mathematics
PPS. I'm an analyst/programmer by occupation and a mathematician and
(independently from math) a computer scientist by education.
PPS. Thanks for bearing with my thoughts above. BTW I am still waiting
for some knowledgeable and good will person to help me with compiling
64-bit R on Sun Solaris 10 OS on amd64 machine. I dug in to the point of
integer/pointer/fp sizes/accuracy in lazyload() (or whatever exact
spelling is). I can provide complete details if wanted to.

> 
> Its excuse presumably is to save typing, but I would favor 
> using some %thing% operator at the cost of a couple of extra 
> key strokes.
> 
> 2.  Having said that,  it's a reasonable hope that efficiency 
> of dispatch will not be a serious problem.  There are a bunch 
> of fixes, for semantic correctness and efficiency, nearly 
> ready to commit (the Bioconductor folks have been doing some 
> valuable testing).  These should help, and more important 
> perhaps it's fairly easy to see how dispatch in this form can 
> be tuned for performance if necessary.
> 
> John
> 
> Bill Dunlap wrote:
> >>>     >> There have been propositions to make "+" work in S (and
> >>>     >> R) like in some other languages, namely for character
> >>>     >> (vectors),
> >>>     >>
> >>>     >> a + b := paste(a,b, sep="")
> >>> ...
> >>> yes.  I think however if we keep speed and clarity and catching
> >>> user errors all in mind, it would be enough - and better - to
> >>> only dispatch to paste(.,.) when both arguments are character
> >>> (vectors), i.e., the above case needed
> >>>  "a" + as.character(1:7) or "a" + paste(1:7) or "a" + format(1:7)
> >>> which after all is really more clearer, even more for cases of
> >>>  "1" + 2  which I'd rather want keeping to give errors.
> >>>
> >>> If  Char + Num  should work like above, then also
> >>>     Num + Char  should (since after all,  "+" should be 
> commutative
> >>> 			apart from floating point precision issues).
> >>>
> >>> and so the internal C code gets a bit more complicated 
> and slightly
> >>> slower..  something we had in mind we should strongly avoid...
> >>>       
> >> I doubt that it would be measurably slower, but I agree 
> that requiring
> >> both args to be Char could be done in fewer operations than just
> >> requiring one.
> >>
> >> However, I think the consistency argument is stronger.  We 
> have a rule
> >> that operations on mixed types promote the more 
> restrictive type to the
> >> less restrictive one, and I don't think we should handle this case
> >> differently.
> >>
> >> So I'd say we should allow all of Char + Num, Num + Char, 
> and Char +
> >> Char, or, if this costs too much at evaluation time, we 
> shouldn't allow
> >> any of them.
> >>     
> >
> > Currently doing arithmetic on mixed class data.frames
> > produces useful warnings and errors.  E.g.,
> >
> >   > z <- data.frame(Factor=factor(c("Lo","Med","High")),
> >                   Char=letters[1:3],
> >                   Num1=exp(0:2),
> >                   Num2=(1:3)*pi,
> >                   stringsAsFactors=FALSE)
> >   > z+1
> >   Error in FUN(left, right) : non-numeric argument to 
> binary operator
> >   In addition: Warning message:
> >   + not meaningful for factors in: Ops.factor(left, right)
> >   > z[,-2] + 1
> >     Factor     Num1      Num2
> >   1     NA 2.000000  4.141593
> >   2     NA 3.718282  7.283185
> >   3     NA 8.389056 10.424778
> >   Warning message:
> >   + not meaningful for factors in: Ops.factor(left, right)
> >
> > If we made + do paste(sep="") for character+number then
> > we would lose the messages and let garbage flow further
> > down the pipe.
> >
> > Should factor data be treated as character data in this
> > case (e.g., pasting to the levels)?  That would be weird,
> > but many users confound character and factor data when
> > they are buried in data.frames.
> >
> > 
> --------------------------------------------------------------
> --------------
> > Bill Dunlap
> > Insightful Corporation
> > bill at insightful dot com
> > 360-428-8146
> >
> >  "All statements in this message represent the opinions of 
> the author and do
> >  not necessarily reflect Insightful Corporation policy or position."
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >   
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



More information about the R-devel mailing list