[R] Accented characters, windows

Duncan Murdoch murdoch.duncan at gmail.com
Wed Mar 30 00:56:05 CEST 2016


On 29/03/2016 5:39 PM, Jan Kacaba wrote:
> I have problem with accented characters. My OS is Win 8.1 and I'm using
> RStudio.
>
> I make string :
> av="ěščřž"
>
> When I call "av" I get result bellow.
>> av
> [1] "ìšèøž"
>
> The resulting characters are different. I have similar problem when I write
> string to a file. In RGUI if I call "av" it prints characters correctly,
> but using "write" function to print string in a file results in the same
> problem.
>
> Can you please help me how to deal with it?

You don't say what code page you're using.

R in Windows has a long standing problem that it works mainly in the 
local code page, rather than working in UTF-8 as most other systems do. 
  (This is due to the fact that when the internationalization was put 
in, UTF-8 was exotic, rather than ubiquitous as it is now.)  So R can 
store UTF-8 strings on any system, but for display it converts them to 
the local code page, and that conversion can lose information if the 
characters aren't supported locally.

With your string, I don't see the same thing as you, I see

"ešcrž"

which is also incorrect, but looks a little closer, because it does a 
better approximation in my code page.

So if you think my result is better than yours, you could change your 
system to code page 437 as I'm using, but that will probably cause you 
worse problems.

Probably the only short term solution that would be satisfactory is to 
stop using Windows.  At some point in the future the internal character 
handling in R needs an overhaul, but that's a really big, really 
thankless job.  Perhaps Microsoft/Revolution will donate some programmer 
time to do it, but more likely, it will wait for volunteers in R Core to 
do it.  I don't think it will happen in 2016.

Duncan Murdoch



More information about the R-help mailing list