[R] UTF-8 to the console

Tomas Kalibera tom@@@k@||ber@ @end|ng |rom gm@||@com
Thu Oct 13 17:41:41 CEST 2022


Dear Helmut,

thanks for the report, this is actually a bug in Rterm (or Windows, hard 
to tell, but something that can be fixed in Rterm). More below

On 6/23/22 12:26, Helmut Schütz wrote:
> Dear all,
>
> I want to send UTF-8 characters to the console. Font in the 
> GUI-Preference 'Lucida Console', supporting the desired symbols:
> greater than or equal: UTF-8 2265, HTML-entity ≥ HTML-Unicode 
> ≥ TeX \ge
> approximately equal: UTF-8 2248, HTML-entity ≈ HTML-Unicode 
> ≈ TeX \approx
>
> txt <- "x ≥ y, x \u2265 y; a ≈ b, a \u2248 b"
> Encoding(txt) <- "UTF-8"
> print(txt)
> [1] "x = y, x = y; a \230 b, a \230 b"
> cat(txt, "\n")
> x = y, x = y; a ˜ b, a ˜ b
>
> Desired "x ≥ y, x ≥ y; a ≈ b, a ≈ b"
>
> I'm sending the email in UTF-8. Don’t know how @r-project.org is 
> configured (ASCII?) If you see garbage, I'm sorry but you should get 
> the idea.
>
> R 4.2.0 on Windows 7 (UCRT10.0.10240.16390) and Windows 11.

The underlying problem I can reproduce on my Windows 10 (which is almost 
surely what you are seeing on Windows 11) is that characters ≥ and ≈ 
cannot be pasted to RTerm when running in cmd.exe or PowerShell. Pasting 
these characters pastes nothing.

I've fixed this now in R-devel 83094 (and R-patched 83095). I would be 
grateful if you (or anyone else) could test e.g. in R-patched, most 
likely this example will work as it did for me, but also other examples 
you can think of. Processing the input keys in Rterm/getline is very 
tricky and brittle. What the code sees depends on what the console 
implementation decides to do, and it differs for different console 
implementations, and sadly this is not documented as far as I could find.

Now, the problem you reported does not happen in Msys2/mintty (so 
Rtools42) terminal, because the terminal uses a different console 
implementation. Also, the problem doesn't happen with the Windows 
Terminal application, which has a yet different implementation. If you 
ever needed a work-around to such problems, I would recommend trying the 
Windows Terminal application.

The problem doesn't happen in Rgui, either, but that uses a different 
code path completely on R end, indeed it does not run Rterm.

There is a key combination "Alt+I" you can press in RTerm, which will 
switch to debug mode and will display the keyboard codes R receives (it 
matches the sources in getline.c). When one sees different behavior of 
things like your reported problem in with different console 
implementations, it usually comes with different keyboard codes sent to R.

Your report has been very useful, thanks, and sorry for the long delay. 
I would have spotted it earlier on R bugzilla (or R-devel) list.

Best
Tomas

>
> Helmut



More information about the R-help mailing list