[R] NEW: Sociolects in R

Charles Annis, P.E. Charles.Annis at StatisticalEngineering.com
Tue Apr 1 16:45:20 CEST 2008


Groovy!!!

Charles Annis, P.E.

Charles.Annis at StatisticalEngineering.com
phone: 561-352-9699
eFax:  614-455-3265
http://www.StatisticalEngineering.com
 

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Peter Dalgaard
Sent: Tuesday, April 01, 2008 10:19 AM
To: R help
Subject: [R] NEW: Sociolects in R

The R translation teams have done a great job in making R usable for
people who do not have English as their mother tongue. However, even
within English speaking countries, there are groups which have trouble
with the language, and it may be valuable to support the Sociolects of
these groups too.
Thanks to a generous contribution from Lars Polifo, these features will
be made available in an upcoming version of R.

As it turns out, there are some particularly interesting challenges that
needs to be addressed. Consider for instance the translation of the t
test in the locale en_SF_US.UTF8 (notice the interjection of the code
"SF" to denote "San Fernando Valley")

t.test(extra ~ group, oh, baby, data = sleep)

        Welch Two Sample t-test

data:  extra by group
t = -1.8608, like, df = 17.776, like, wow, p-value = 0.0794
alternative hypothesis: true difference in means is like, ya know, not equal
to 0
95 percent confidence interval:
 -3.3654832  0.2054832
sample estimates:
mean in group 1 mean in group 2
           0.75            2.33



Notice that in addition to the simple message string modifications, it
has been necessary to modify the parser so as to delete obviously
superfluous arguments such as "oh" or "baby" (a particular issue here is
that the argument "like" might actually be intended to mean likelihood).
Similarly, for se_KC_SE.UTF8 (KC for "kitchen") we have alternate
spellings of arguments like "data":

t.test(ixtra ~ gruoop, deta = sleep)

        Velch Tvu Semple-a t-test

deta:  ixtra by gruoop
t = -1.8608, dff = 17.776, p-felooe-a = 0.0794
elterneteefe-a hypuzeesees: trooe-a deefffference-a in meuns is nut iqooel
tu 0
95 percent cunffeedence-a interfel:
 -3.3654832  0.2054832
semple-a isteemetes:
meun in gruoop 1 meun in gruoop 2
           0.75            2.33

Canadian  English poses particular problems, which have not yet been
resolved.  If we are to do it properly, it would entail modifications to
the R language itself. For instance we'd have to introduce a "four" loop
and change the end-brace to the four-character string "eh?}".

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list