[Rd] SUGGESTION: Force install.packages() to use ASCII encoding when parse():ing code?

Duncan Murdoch murdoch.duncan at gmail.com
Fri Dec 12 13:06:31 CET 2014

On 12/12/2014, 7:34 AM, Jan Kim wrote:
> On Fri, Dec 12, 2014 at 06:01:22AM -0500, Duncan Murdoch wrote:
>> On 12/12/2014, 4:12 AM, Bj??rn-Helge Mevik wrote:
>>> Duncan Murdoch <murdoch.duncan at gmail.com> writes:
>>>> users of other languages may want to have messages and variable names
>>>> in their native language, and ASCII might not be enough for that.
>>> Allowing for messages in non-ASCII encodings would probably be a good
>>> idea, but I think allowing non-ASCII variable names is dangerous.
>> Dangerous in what way?
>> I agree that CRAN probably shouldn't accept packages like that, at least
>> for exported symbols:  packages there should run anywhere.  But I
>> suspect that the majority of R packages are for private use, and will
>> never be sent to CRAN.  Do you know any reason that non-ASCII names
>> would be dangerous for those?
>> Duncan Murdoch
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> I'm would perhaps not go as far as calling them dangerous, but non-ASCII
> characters in code are a mixed blessing which personally I'd opt to not
> have, on balance. Being German I can understand that people may want
> umlauted characters in their variable names, but where this catches on,
> it's just a matter of time that people get characters into their code that
> are different but indistinguishable in the font they use (I've seen this
> with \H{o} rather than a \"{o}), and mega-personmonths are wasted puzzling
> over tracking down these problems.
> While many packages are used in-house at least initially, making a
> package is a step towards releasing it, so I'd anticipate that having
> an option to support weeding out any potentially troublesome identifiers
> has the potential to do some good.

That's a good point.  I guess I'm thinking of Asian languages where the
transliteration into ASCII loses a lot of information, and (I'm told) is
uncomfortable for native speakers to read.  I think R should be usable
in those languages in a way that is comfortable for them, but they
should be warned that doing so limits portability.

Duncan Murdoch

More information about the R-devel mailing list