[Rd] Encoding API

Thomas Friedrichsmeier thomas.friedrichsmeier at ruhr-uni-bochum.de
Thu Feb 15 21:10:41 CET 2007


I've been observing the recent SVN log entries about encoding information in 
CHARSXPs with great interest. This looks like a very nice addition. While 
this is still work in progress, I'd like to suggest the following extra:

At least in RKWard, all shown strings need to be converted to UTF-8 (the 
internal storage format used in Qt QStrings). This needs to be done 
independent of the current locale, and the encoding used in the embedded R 
process. I imagine other graphical or non-graphical toolkits will similarly 
use UTF-8 to store strings, internally.

For this reason, an addition of e.g.

char* Rf_translateCharToUTF8(SEXP);

would be nice. This function would translate to UTF-8 independently of the 
current LC_CTYPE. While it is possible to achieve the same effect by first 
translating the strings to the current LC_CTYPE encoding (using 
Rf_translateChar()), and then translate to UTF-8 in a second step (using 
custom means, if needed), being able to do this conversion in a single step 
would be more elegant, and also potentially avoid expensive recoding steps.

Alternatively, having access to the IS_UTF8 and IS_LATIN1 macros from C would 
be good enough to hand-code efficient conversion to UTF-8 (but may be too 
close to the internals).

Not sure, whether this is considered important enough to warant inclusion in 
the API, but I just wanted to throw in the idea in time.

Thomas Friedrichsmeier
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : https://stat.ethz.ch/pipermail/r-devel/attachments/20070215/9f70fc9c/attachment.bin 

More information about the R-devel mailing list