[Rd] special latin1 do not print as glyphs in current devel on windows

Daniel Possenriede possenriede at gmail.com
Mon Jul 31 18:17:59 CEST 2017


Sorry, if I am spamming/not using the right list, but I think I might be
onto a regression in current devel.

Namely, special (non-ASCII) characters with latin1 encoding do not get
printed as glyphs with R 3.5.0 devel but were with R 3.4.1.

This output is from

# R version 3.4.1 (2017-06-30) -- "Single Candle"
# Platform: x86_64-w64-mingw32/x64 (64-bit)

> x <- c("€", "–", "‰") # Euro, en-dash, promille
> # v3.4.1 prints latin1 characters fine
> print(x)
[1] "€" "–" "‰"

And this (and all following) output is from

# R Under development (unstable) (2017-07-30 r73000) -- "Unsuffered
Consequences"
# Platform: x86_64-w64-mingw32/x64 (64-bit)

> x <- c("€", "–", "‰") # Euro, en-dash, promille
> # printed as escapes with 3.5.0 devel
> print(x)
[1] "\u0080" "\u0096" "\u0089"

The possible regression ends here, all following output is the same with
v.3.4.1 and 3.5.0 devel.

Possibly a second, but IMHO related issue is that encoding to UTF-8 does
not help and that information is lost when encoding back to latin1.

First, chars are printed as escapes as well, when converted to UTF-8, which
is unexpected, considering that escapes can be printed as glyphs (see
below).

> Encoding(x)
[1] "latin1" "latin1" "latin1"
> x_utf8 <- enc2utf8(x)
> Encoding(x_utf8)
[1] "UTF-8" "UTF-8" "UTF-8"
> print(x_utf8)
[1] "\u0080" "\u0096" "\u0089"

Converting back to native is lossy (which, to me, is also unexpected).

# When converting x_utf8 back to native encoding, chars are not marked as
latin-1 ...
> x_nat <- enc2native(x_utf8)
> Encoding(x_nat)
[1] "unknown" "unknown" "unknown"
> print(x_nat)
[1] "<U+0080>" "<U+0096>" "<U+0089>"

Other unicode chars print fine as glyphs when entered as escapes (cf.
enc2utf8(x) above)

> z <- c("\u215B", "\u2105", "\u03B7") # 1/8, c/o, eta
> Encoding(z)
[1] "UTF-8" "UTF-8" "UTF-8"
> print(z)
[1] "⅛" "℅" "η"

But changing encoding is also not such a good idea here.

> z_nat <- enc2native(z)
> Encoding(z_nat)
[1] "unknown" "unknown" "unknown"
> z_utf8 <- enc2utf8(z_nat)
> Encoding(z_utf8)
[1] "unknown" "unknown" "unknown"
> print(z_utf8)
[1] "<U+215B>" "<U+2105>" "<U+03B7>"

	[[alternative HTML version deleted]]



More information about the R-devel mailing list