[Rd] String encoding problem

Hadley Wickham h.wickham at gmail.com
Thu Jul 7 17:40:41 CEST 2016


On Thu, Jul 7, 2016 at 10:11 AM, Duncan Murdoch
<murdoch.duncan at gmail.com> wrote:
> On 07/07/2016 10:57 AM, Hadley Wickham wrote:
>>
>> If you print:
>>
>> "\xc9\x82\xbf"
>>
>> you get
>>
>>  "\u0242\xbf"
>>
>> But if you try and evaluate that string you get:
>>
>>>  "\u0242\xbf"
>>
>> Error: mixing Unicode and octal/hex escapes in a string is not allowed
>>
>> (Probably will only happen on mac/linux with default utf-8 encoding)
>
>
> I'm not sure what should happen here, but that's not a legal string in a
> UTF-8 locale, so it's not too surprising that things go wonky.

Here's bit more context on how I got that sequence of bytes:

x <- "こんにちは"
y <- iconv(x, to = "Shift-JIS")
Encoding(y)
y

I did this to create an example to demonstrate how to handle encoding
problems, and it's bit frustrating that I have to manually mangle the
string in order to be able to re-use it in another session.  Maybe
strings with unknown encoding shouldn't use unicode escapes?

Hadley

-- 
http://hadley.nz



More information about the R-devel mailing list