[Rd] How to get utf8 string using R externals

brodie gaslam brod|e@g@@|@m @end|ng |rom y@hoo@com
Thu Jun 3 03:08:05 CEST 2021


> On Wednesday, June 2, 2021, 7:58:54 PM EDT, xiaoyan yu <xiaoyan.yu using gmail.com> wrote:
>
> I am using gmail. Not sure of the configuration of plain text.
> The memory pointed by the char * as the output of Rf_translateChar() is
> actually the string "<U+BD80><U+C2E4>".

Hi Xiaoyan,

Unfortunately I'm not super familiar with R on Windows, but I think
I can provide a simpler reproducible example.  In Rgui, if I type "\UBD80"
at the prompt and hit enter, I see the desired glyph.  In Rterm I see the
unicode escape.

IIRC the capabilities of Rterm and Rgui are different, and UTF8 support
in windows is limited.  Tomas Kalibera discusses this in some detail:

https://developer.r-project.org/Blog/public/2020/05/02/utf-8-support-on-windows/index.html

In terms of `Rf_translateChar()`, presumably the `Riconv` call is failing
on Rterm, but not on Rgui:

https://github.com/r-devel/r-svn/blob/master/src/main/sysutils.c#L924

I'm guessing, but that would explain why the C level string is in that
format.  I don't know why the string would translate in Rgui though.  My
guess is that it did not as even in Rgui the following:

    enc2native("\uBD80")

Produces the escaped version of the string.

As others have suggested you could try the experimental UCRT Windows release:

https://developer.r-project.org/Blog/public/2021/03/12/windows/utf-8-toolchain-and-cran-package-checks/index.html

Install instructions (focus on Binary installer):

https://svn.r-project.org/R-dev-web/trunk/WindowsBuilds/winutf8/ucrt3/howto.html

If I try UCRT on my system this no longer produces the escape:

    enc2native("\uBD80")

Although all I see is a question mark.  My guess is that my code page or
something similar is not set right.  Examining with `charToRaw` reveals
the string remains in UTF-8 encoding.

Aside: it's not clear to me that you need to translate the string if your
intent is for it to remain UTF-8.  You just don't seem to be set-up to
interpret UTF-8 strings currently.

Best,

B

> On Wed, Jun 2, 2021 at 6:09 PM David Winsemius <dwinsemius using comcast.net>
> wrote:
>
>> First; you should configure yopu mail client to send plain text.
>>
>> Can you explain what is meant by:
>>
>> the characters are unicodes (<U+BD80><U+C2E4>) instead of
>> utf8 encoding of the korean characters 부실.
>>
>> As far as I can tell those two unicodes _are_ the utf8 encodings of 부실.
>>
>> You may need to consult a couple of R help pages. I suggest:
>>
>> ?Quotes
>> ?points  # has examples of changing fonts used for display on console.
>>
>> Sorry if I've misunderstood. I'm not on a Windows device, so  posting the
>> C++ program won't be helpful, but maybe it would for other prospective
>> respondents.
>>
>> --
>> David.
>>
>> On 6/2/21 1:33 PM, xiaoyan yu wrote:
>> > I have a R Script Predict.R:
>> >      set.seed(42)
>> >      C <- seq(1:1000)
>> >      A <- rep(seq(1:200),5)
>> >      E <- (seq(1:1000) * (0.8 + (0.4*runif(50, 0, 1))))
>> >      L <- ifelse(runif(1000)>.5,1,0)
>> >      df <- data.frame(cbind(C, A, E, L))
>> > load("C:/Temp/tree.RData")                #  load the model for scoring
>> >
>> >    P <- as.character(predict(tree_model_1,df,type='class'))
>> >
>> > Then in a C++ program
>> > I call eval to evaluate the script and then findVar the P variable.
>> > After get each class label from P using string_elt and then
>> > Rf_translateChar, the characters are unicodes (<U+BD80><U+C2E4>) instead
>> of
>> > utf8 encoding of the korean characters 부실.
>> > Can I know how to get UTF8 by using R externals?
>> >
>> > I also found the same script giving utf8 characters in RGui but unicode
>> in
>> > Rterm.
>> > I tried to attach a screenshot but got message "The message's content
>> type
>> > was not explicitly allowed"
>> > In RGui, I saw the output 부실, while in Rterm, <U+BD80><U+C2E4>.
>> >
>> > Please help.
>> >
>> >      [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-devel using r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>>
>
>     [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>



More information about the R-devel mailing list