[R] replacing unicode characters

Ivan Krylov kry|ov@r00t @end|ng |rom gm@||@com
Fri Jun 30 13:10:27 CEST 2023


On Fri, 30 Jun 2023 11:33:34 +0300
Adrian Dușa <dusa.adrian using unibuc.ro> wrote:

> In a very simple test, I tried creating a text file from the Electron
> app embedded R:
> sink("test.txt")
> cat("\u00e7")
> sink()
> 
> which resulted in:
> 
> <U+00E7>
> 
> I don't quite understand how this works, my best guess is it matters
> less how R interprets these characters, but how they are passed
> through the child process that started R.

Something goes wrong with the locale setting when the R child process
is being launched. For example,

Rscript -e 'cat("\ue7\n")'
# ç

but:
LC_ALL=C Rscript -e 'cat("\ue7\n")'
# <U+00E7>

When preparing \ue7 for output, R decides that it's not representable
in the session encoding. What's the output of sessionInfo() and
l10n_info() in the child process?

-- 
Best regards,
Ivan



More information about the R-help mailing list