[R] replacing unicode characters

Adrian Dușa du@@@@dr|@n @end|ng |rom un|buc@ro
Fri Jun 30 10:33:34 CEST 2023


Hello Iris,

Thanks for your answer. The thing is not that I want to obtain <U+00E7>,
but that it gets produced regardless.
In the meantime, I have assessed the real culprit is somehow (perhaps)
related to the locale or encoding of the terminal process that I use to run
R, in that particular MacOS Electron application (it doesn't happen on
Windows).

In a very simple test, I tried creating a text file from the Electron app
embedded R:
sink("test.txt")
cat("\u00e7")
sink()

which resulted in:

<U+00E7>

I don't quite understand how this works, my best guess is it matters less
how R interprets these characters, but how they are passed through the
child process that started R.
I'd be grateful for any hint in this direction, if anyone has experience.

Best wishes,
Adrian

On Thu, Jun 29, 2023 at 1:59 AM Iris Simmons <ikwsimmo using gmail.com> wrote:

> Hiya!
>
>
> You can do this by specifying sub="c99" instead of "Unicode":
>
> ```R
> x <- "fa\xE7ile"
> xx <- iconv(x, "latin1", "UTF-8")
> iconv(xx, "UTF-8", "ASCII", "c99")
> ```
>
> produces:
>
> ```
> > x <- "fa\xE7ile"
> > xx <- iconv(x, "latin1", "UTF-8")
> > iconv(xx, "UTF-8", "ASCII", "c99")
> [1] "fa\\u00e7ile"
> >
> ```
>
> For future reference, you can find this in section Examples of the
> help page ?iconv
> I hope this helps!
>
> On Wed, Jun 28, 2023 at 3:09 PM Adrian Dușa <dusa.adrian using gmail.com> wrote:
> >
> > Dear list,
> >
> > Building on the example from ?iconv:
> > x <- "fa\xE7ile"
> > xx <- iconv(x, "latin1", "UTF-8") # "façile"
> >
> > and:
> >
> > iconv(xx, "UTF-8", "ASCII", "Unicode")
> > # "fa<U+00E7>ile"
> >
> > This is the type of result I sometimes get from an R script that I cannot
> > reproduce here, because it depends on a terminal process started in a
> > compiled Electron (Node.js) application, under MacOS.
> >
> > I was wondering, is there a standard way, perhaps also using iconv(), to
> > convert this type of result to a more manageable unicode representation?
> >
> > Something like: "fa\u00e7ile"
> >
> > Or perhaps a clever regexp, for any number of such occurrences in a
> string?
> >
> > Thanks a lot in advance,
> > Adrian
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>


-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr. 90-92
050663 Bucharest sector 5
Romania
https://adriandusa.eu

	[[alternative HTML version deleted]]



More information about the R-help mailing list