[Rd] Printing chinese characters (UTF-8) on R 3.5.2 -windows 10

Fri Sep 13 13:33:59 CEST 2019

On Fri, Sep 13, 2019 at 11:53 AM Tomas Kalibera <tomas.kalibera using gmail.com>
wrote:

> On 9/13/19 11:37 AM, IAGO GINÉ VÁZQUEZ wrote:
> > But if I type
> > >"會"
> > the output is
> > [1] "會"
> > so seemingly it can be represented. Or, am I wrong?
>
> In RGui you can print the string, because RGui is a Windows Unicode
> application (uses UTF16-LE and bypasses the C runtime for strings). But
> it is just the gui, R itself (and hence also packages) use the current
> native encoding as defined by the C runtime. RGui will make sure R gets
> the string in UTF-8, but as soon as you do anything even slightly
> non-trivial, which includes formatting, the string will be converted to
> the current native encoding. Some R functions allow you to do certain
> things in UTF-8 without conversion to native encoding, you'd have to
> read very carefully the documentation for each function - but for
> practical use, you either need to live with the misinterpretation of
> some characters, or use Windows in the locale where your characters can
> be represented (e.g. Chinese locale when working with Chinese strings),
> or use Linux/maOS. On Linux/macOS the current native encoding can be
> UTF-8, so there is no problem. On Windows, with the current toolchain
> based on mingw, this is not possible.
>

mingw-w64 is capable of processing utf-8 (it can process bytes after all).
Can you explain what you mean here? Would any other compiler on Windows not
suffer from this problem?

>
>
> Best
> Tomas
>
> >
> > Best
> > Iago
> > ------------------------------------------------------------------------
> > *De:* Tomas Kalibera <tomas.kalibera using gmail.com>
> > *Enviat el:* divendres, 13 de setembre de 2019 11:24
> > *Per a:* IAGO GINÉ VÁZQUEZ <i.gine using pssjd.org>; r-devel using r-project.org
> > <r-devel using r-project.org>
> > *Tema:* Re: [Rd] Printing chinese characters (UTF-8) on R 3.5.2
> > -windows 10
> > On 9/13/19 11:01 AM, IAGO GINÉ VÁZQUEZ wrote:
> > > I have a chinese character on a data frame, but the output of
> > printing it is its UTF-8 code. Concretely, the character is 會 and the
> > code is U+6703. Following the code I arrive to the instruction
> > >
> > >> base::format.default("會")
> > > which prints
> > >
> > > [1] "<U+6703>"
> > >
> > > I do not know which is the extent of this behaviour either if it
> > follows on most recent versions of R.
> > >
> > > Is it expected?
> >
> > If you are running this on Windows in an encoding where the character
> > cannot be represented (e.g. non-Chinese locale), then yes, this is
> > expected behavior.
> >
> > On Unix systems where R can run in UTF-8 encoding (Linux, macOS), the
> > character will be formatted/displayed properly.
> >
> > Best
> > Tomas
> >
> > >
> > > Thank you!
> > >
> > > Iago
> > >
> > >        [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-devel using r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

	[[alternative HTML version deleted]]