[R] Unicode normalization?

Allan Engelhardt allane at cybaea.com
Wed Jun 17 17:35:43 CEST 2009


Does R support unicode normalization?  For my application, I'd quite 
like to test for canonical equivalence (e.g. "n\u0303" is equivalent to 
"\u00F1" which is ñ) and ideally convert strings to NFD form.  ("\u0303" 
is the "combining tilde" character.)  Is there a package for this?

The Unicode Normalization FAQ [1] states that "Programs should always 
compare canonical-equivalent Unicode strings as equal" so is it even a 
bug that "n\u0303" != "\u00F1" in my version of R?

Allan

[1] see http://www.unicode.org/unicode/faq/normalization.html




More information about the R-help mailing list