[R] replacing unicode characters

Adrian Dusa du@@@@dr|@n @end|ng |rom un|buc@ro
Fri Jun 30 16:34:37 CEST 2023


Right on the point, Ivan, that was the issue. The output from l10n_info()
was:

$MBCS
[1] FALSE

$`UTF-8`
[1] FALSE

$`Latin-1`
[1] FALSE

$codeset
[1] "US-ASCII"

(and the locale was just "C")

I simply needed to write something like:
export LC_ALL='en_US.UTF-8'

before starting the child process, and everything looks good now.

Thanks a lot, much obliged,
Adrian


On Fri, Jun 30, 2023 at 2:10 PM Ivan Krylov <krylov.r00t using gmail.com> wrote:

> On Fri, 30 Jun 2023 11:33:34 +0300
> Adrian Dușa <dusa.adrian using unibuc.ro> wrote:
>
> > In a very simple test, I tried creating a text file from the Electron
> > app embedded R:
> > sink("test.txt")
> > cat("\u00e7")
> > sink()
> >
> > which resulted in:
> >
> > <U+00E7>
> >
> > I don't quite understand how this works, my best guess is it matters
> > less how R interprets these characters, but how they are passed
> > through the child process that started R.
>
> Something goes wrong with the locale setting when the R child process
> is being launched. For example,
>
> Rscript -e 'cat("\ue7\n")'
> # ç
>
> but:
> LC_ALL=C Rscript -e 'cat("\ue7\n")'
> # <U+00E7>
>
> When preparing \ue7 for output, R decides that it's not representable
> in the session encoding. What's the output of sessionInfo() and
> l10n_info() in the child process?
>
> --
> Best regards,
> Ivan
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list