[Rd] R 3.5.3 and 3.6.0 alpha Windows bug: UTF-8 characters in code are simplified to wrong ones

Tomas Kalibera tom@@@k@||ber@ @end|ng |rom gm@||@com
Wed Apr 10 13:26:23 CEST 2019


On 4/10/19 1:14 PM, Jeroen Ooms wrote:
> On Wed, Apr 10, 2019 at 12:19 PM Tomáš Bořil <borilt using gmail.com> wrote:
>> Minimalistic example:
>> Let's type "ř" (LATIN SMALL LETTER R WITH CARON) in RGui console:
>>> "ř"
>> [1] "r"
>>
>> Although the script is in UTF-8, the characters are replaced by
>> "simplified" substitutes uncontrollably (depending on OS locale). The
>> same goes with simply entering the code statements in R Console.
>>
>> The problem does not occur on OS with UTF-8 locale (Mac OS, Linux...)
> I think this is a "feature" of win_iconv that is bundled with base R
> on Windows (./src/extra/win_iconv). The character from your example is
> not part of the latin1 (iso-8859-1) set, however, win-iconv seems to
> do so anyway:
>
>> x <- "\U00159"
>> print(x)
> [1] "ř"
>> iconv(x, 'UTF-8', 'iso-8859-1')
> [1] "r"
>
> On MacOS, iconv tells us this character cannot be represented as latin1:
>
>> x <- "\U00159"
>> print(x)
> [1] "ř"
>> iconv(x, 'UTF-8', 'iso-8859-1')
> [1] NA
>
> I'm actually not sure why base-R needs win_iconv (but I'm not an
> encoding expert at all). Perhaps we could try to unbundle it and use
> the standard libiconv provided by the Rtools toolchain bundle to get
> more consistent results.

win_iconv just calls into Windows API to do the conversion, it is 
technically easy to disable the "best fit" conversion, but I think it 
won't be a good idea. In some cases, perhaps rare, the best fit is good, 
actually including the conversion from "ř" to "r" which makes perfect 
sense. But more importantly, changing the behavior could affect users 
who expect the substitution to happen because it has been happening for 
many years, and it won't help others much.

Tomas

>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list