[Rd] Small encoding question

Simon Urbanek simon.urbanek at r-project.org
Fri Feb 15 01:17:54 CET 2008

I think I found the cause, but fixing it may be more complicated  
(other than a hot fix for this particular case).

What it boils down to is that the code for .check_package_code_syntax  
is trying to change the locale in a manner that doesn't work. In  
addition to that, the output of l10n_info() is wrong (for some  
definition of wrong), which complicates things even further.

To top it all, if run in a UTF-8 locale, everything is just fine -  
that's why the package will pass check on "regular" OS X, because  
UTF-8 locale is the default since Leopard.

.check_package_code_syntax() sees that the source requires Latin1, so  
it is checking whether the locale is utf-8, but it's not (because we  
force C) so it uses en_US. This may be the first problem, because  
en_US is not necessarily a latin1 locale at all (en_US.ISO8859-1 would  
be latin1 on OS X). However, the next problem is that l10n_info() is  
returning FALSE even for the (correct) latin1 locale and  
consequently(?) the reading fails.

ginaz:~$ echo 'Sys.getlocale(); l10n_info()'|LANG=en_US.ISO8859-1 R -- 
vanilla --slave
[1] "en_US.ISO8859-1/en_US.ISO8859-1/en_US.ISO8859-1/C/en_US.ISO8859-1/ 



en_US.ISO8859-1 *is* a latin-1 locale ... I was looking hard and found  
no way how to link (installed) locales to encodings - there is no  
official mapping and POSIX allows arbitrary locales (and names) ..  
Hence all locale names are merely loose conventions... so I'm not sure  
how can R even make such a decision (other than parse the name?).

Anyway - a quick fix would be to force en_US.UTF-8  locale in that  
check for Mac OS X, but I think that doesn't fix the underlying  
problems ...


On Feb 14, 2008, at 3:09 PM, Simon Urbanek wrote:

> On Feb 14, 2008, at 2:45 PM, Kurt Hornik wrote:
>>>>>>> Vincent Goulet writes:
>>> Dear developeRs,
>>> Compilation of the latest version (0.9-5) of my actuar package fails
>>> with r-release MacOS_X ix86 on CRAN; see
>>> 	http://www.R-project.org/nosvn/R.check/r-release-macosx-ix86/actuar-00check.html
>>> All errors come from accented letters in comments in latin-1 encoded
>>> files (except hierarc.R which is in UTF-8, my bad). Encoding is
>>> declared as latin-1 in DESCRIPTION.
>>> The package checks and compiles fine on Windows, Linux and,
>>> ironically, my MacOS X main development machine. I realize using  
>>> non-
>>> ASCII characters in source files is not a good idea and I removed
>>> them, but I would appreciate any clue as to what went wrong with the
>>> compilation on CRAN.
>> I assume that the MacOS X builds are done in a C locale?
> Yes - but isn't this very similar to the problem we have been talking
> about a while back? The check analyses were reporting an error
> although the code was fine (I think it boiled down to text connection
> I/O in the check scripts failing mysteriously due to the fact that it
> was using the wrong encoding) I'll have to check later today ...
> Cheers,
> S
>>> FWIW,
>>>> sessionInfo()
>>> R version 2.6.2 (2008-02-08)
>>> i386-apple-darwin8.10.1
>>> locale:
>>> fr_CA.UTF-8/fr_CA.UTF-8/fr_CA.UTF-8/C/fr_CA.UTF-8/fr_CA.UTF-8
>>> attached base packages:
>>> [1] stats     utils     datasets  grDevices graphics  methods   base
>>> other attached packages:
>>> [1] CarbonEL_0.1-4
>>> loaded via a namespace (and not attached):
>>> [1] tools_2.6.2
>>> Thanks in advance!
>>> ---
>>>  Vincent Goulet, Associate Professor
>>>  École d'actuariat
>>>  Université Laval, Québec
>>>  Vincent.Goulet at act.ulaval.ca   http://vgoulet.act.ulaval.ca
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

More information about the R-devel mailing list