[R] Accented characters, windows

Jan Kacaba jan.kacaba at gmail.com
Wed Mar 30 21:42:47 CEST 2016


Duncun, thank you for your reply. My encoding is:

> Sys.getlocale('LC_CTYPE')
[1] "Czech_Czech Republic.1250"

In RStudio I use UTF-8. I tried also other recommended encodings but some
characters are still misrepresented.

I've found solution to this. To correctly display strings in RStudio I have
to convert strings:
iconv(x,"CP1250","UTF-8")

If I want to write string into file:
zz=file("myfile.txt", "w", encoding="UTF-8")
cat(x,file = zz, sep = "\n")

It seems there is no need using icon() if I just need to write string to a
file.

I hope there is no problem processing strings with other functions like
paste, strsplit, grep though.

Derek

2016-03-30 0:56 GMT+02:00 Duncan Murdoch <murdoch.duncan at gmail.com>:

> On 29/03/2016 5:39 PM, Jan Kacaba wrote:
>
>> I have problem with accented characters. My OS is Win 8.1 and I'm using
>> RStudio.
>>
>> I make string :
>> av="ěščřž"
>>
>> When I call "av" I get result bellow.
>>
>>> av
>>>
>> [1] "ìšèøž"
>>
>> The resulting characters are different. I have similar problem when I
>> write
>> string to a file. In RGUI if I call "av" it prints characters correctly,
>> but using "write" function to print string in a file results in the same
>> problem.
>>
>> Can you please help me how to deal with it?
>>
>
> You don't say what code page you're using.
>
> R in Windows has a long standing problem that it works mainly in the local
> code page, rather than working in UTF-8 as most other systems do.  (This is
> due to the fact that when the internationalization was put in, UTF-8 was
> exotic, rather than ubiquitous as it is now.)  So R can store UTF-8 strings
> on any system, but for display it converts them to the local code page, and
> that conversion can lose information if the characters aren't supported
> locally.
>
> With your string, I don't see the same thing as you, I see
>
> "ešcrž"
>
> which is also incorrect, but looks a little closer, because it does a
> better approximation in my code page.
>
> So if you think my result is better than yours, you could change your
> system to code page 437 as I'm using, but that will probably cause you
> worse problems.
>
> Probably the only short term solution that would be satisfactory is to
> stop using Windows.  At some point in the future the internal character
> handling in R needs an overhaul, but that's a really big, really thankless
> job.  Perhaps Microsoft/Revolution will donate some programmer time to do
> it, but more likely, it will wait for volunteers in R Core to do it.  I
> don't think it will happen in 2016.
>
> Duncan Murdoch
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list