[R] Writing escaped unicode

Jan T Kim jttkim at googlemail.com
Tue Dec 11 11:49:18 CET 2012


On Mon, Dec 10, 2012 at 11:46:40PM -0500, David Kulp wrote:
> I'd like to write unicode strings using the "\u" escape syntax.  According to the documentation, print.default or encodeString will escape unicode using the \u convention.  In practice, I can't make it work.
> 
> > b="Unicode character: \ufffd"
> > print.default(b)
> [1] "Unicode character: ???"
> > encodeString(b)
> [1] "Unicode character: ???"
> 
> I want to write the string back out in the same escape formatting as I read it in.  This is because I'm interfacing with some Ruby code that requires unicode to be in this escaped format.

as I read the documentation, encodeString escapes control characters,
but not "unicode characters". The notion of a "unicode character" is
not entirely well defined, considering that the very mission of the
unicode consortium is to make sure that there are no non-unicode
characters...  ;-)

>From this it follows that replacing all characters with their \uxxxx
representation, e.g. by

    paste(sprintf("\\u%04x", utf8ToInt(b)), collapse = "");

should work with the Ruby client you try to talk to. Obviously, this
bloats the string rather more than necessary (particularly if most of
the characters are in the ASCII range), but if the volume you're
piping into the client is small, this may be good enough.

Best regards, Jan
-- 
 +- Jan T. Kim -------------------------------------------------------+
 |             email: jttkim at gmail.com                                |
 |             WWW:   http://www.jtkim.dreamhosters.com/              |
 *-----=<  hierarchical systems are for files, not for humans  >=-----*




More information about the R-help mailing list