[Rd] \U with more than 4 digits returns the wrong character

Duncan Murdoch murdoch.duncan at gmail.com
Thu Dec 4 21:34:32 CET 2014


On 04/12/2014, 2:00 PM, Richard Cotton wrote:
> If I type a character using \U syntax that has more than 4 digits, I
> get the wrong character.  For example,
> 
> "\U1d4d0"
> 
> should print a mathematical bold script capital A.  See
> http://www.fileformat.info/info/unicode/char/1d4d0/index.htm
> 
> On my machine, it prints the Hangul character corresponding to
> 
> "\Ud4d0"
> http://www.fileformat.info/info/unicode/char/d4d0/index.htm
> 
> It seems that the hex-digit part is overflowing at 16^4.
> 
> I tested this on R3.1.2 and devel (2014-12-03 r67101) x64 under
> Windows.  I played around with Sys.setlocale and options("encoding"),
> but couldn't get the expected value.
> 
> Can others reproduce this?  It feels like a bug, but experience tells
> me I probably have something silly going on with my setup.
> 

I see this on Windows, but not on OSX.  On Windows:

> as.hexmode(utf8ToInt("\U1d4d0"))
[1] "d4d0"

On OSX:

> as.hexmode(utf8ToInt("\U1d4d0"))
[1] "1d4d0"

I'll see if I can find where the truncation is happening on Windows.

Duncan Murdoch



More information about the R-devel mailing list