[R] about the char _

Duncan Murdoch murdoch at stats.uwo.ca
Fri Oct 5 17:18:06 CEST 2001


On Fri, 5 Oct 2001  Murray Jorgensen wrote:

> Just one small point: the assignment symbol must be about the most commonly
> used symbol in any computer language. Yet in R and S+ it is represented by
> a two-character string with, moreover, a change in case! 

I was curious about this, so I decided to do a quick character count
on the characters in the R library source.  I didn't have the current
version handy, so I did it on the version 1.2.2 source, consisting of
about 1.3 million characters.

This isn't counting symbols, it's counting characters, which is a lot
easier!

And the top 10 in frequency were:

  (0):  38323
  (n):  39921
  (i):  40075
  (s):  40947
  (a):  41137
  (t):  44049
  (.):  49665
  (e):  55619
  (,):  70619
  (SP): 313061

Of these, the only one that is almost certainly a symbol is the comma,
but it sure looks like whitespace is another contender!

If we restrict the counts to the single characters that are usually
symbols (in a really loose sense of "usually" and "symbols" :-), we
see:

  (]):   5731
  ([):   5799
  (=):  12183
  (<):  13897
  (-):  16985
  ("):  23380
  ()):  34756
  (():  34772
  (,):  70619

So assignment could well be in the top 10, but probably isn't in the
top 3.  And for what it's worth, the underscore was way down at a
count of 140.

Back to our regularly scheduled programming...

Duncan Murdoch
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list