[Rd] A question about the API mkchar()

Fán Lóng foylong at gmail.com
Tue Oct 28 11:26:33 CET 2008


Hi guys,


I've got a question about the API mkchar(). I have met some difficulty
in parsing utf-8 string to mkchar() in R-2.7.0.



I was intending to parse an utf-8 string str_jan (some Japanese
characters such asふ, whose utf-8 code is E381B5) to R API SEXP
mkChar(const char *name) , we only need to create the SEXP using the
string that we parsed.



Unfortunately, I found when parsing the variable str_jan, R will
automatically convert the str_jan according to the current locale
setting, so only in the English locale could the function work
correctly, under other locale, such as Japanese or Chinese, the string
will be convert incorrectly. As a matter of fact, those utf-8 code
already is Unicode string, and don't need to be converted at all.



I also tried to use the SEXP Rf_mkCharCE(const char *, cetype_t);,
Parsing the CE_UTF8 as the argument of cetype_t, but the result is
worse. It returned the result as ucs code, an kind of Unicode under
windows platform.



All I want to get is just a SEXP object containing the original utf-8
string, no matter what locale is set currently. Normally what can I
do?





Thanks,

Long



More information about the R-devel mailing list