[R] Plotting the ASCII character set.
kry|ov@r00t @end|ng |rom gm@||@com
Sun Jul 4 13:01:44 CEST 2021
On Sun, 4 Jul 2021 13:59:49 +1200
Rolf Turner <r.turner using auckland.ac.nz> wrote:
> a substantial number of the characters are displayed as a wee
> rectangle containing a 2 x 2 array of digits such as
> > 0 0
> > 8 0
Interesting. I didn't pay attention to it at first, but now I see that
a range of code points, U+0080 to U+009F, corresponds to control
characters (also, 0+00A0 is non-breakable space), not anything
printable. Also, Latin-1 doesn't define any meaning for bytes
0x80..0x9f, but here they are decoded to same-valued Unicode code
points. And the actual code point for € is U+20AC, not even close to
what we're working with.
> Also note that there is a bit of difference between the results of
> using Encoding() and the results of using iconv()
You are right. I didn't know that, but my reading of the function
translateToNative in src/main/sysutils.c suggests that R decodes
strings marked as 'latin1' as Windows-1252 (if it's available for the
system iconv()) and uses the actual Latin-1 as a fallback.
?Encoding does warn that 'latin1' is ambiguous and system-dependent
with regards to bytes 0x80..0x9f, so text() seems to be right to use
Latin-1 and not Windows-1252 when trying to plot byte 0x80 encoded as
CE_LATIN1 as U+0080. Although there's a /* FIXME: allow CP1252? */
comment in src/main/sysutils.c, function reEnc, which is used by text().
> Is there any way that I can get the Euro symbol to display correctly
> in such a graphic?
I think that iconv(a, 'CP1252', '', '\ufffd') should work for you. At
least it seems to work for the € sign. It does leave the following
bytes undefined, represented as � U+FFFD REPLACEMENT CHARACTER:
iconv(sapply(as.raw(1:255), rawToChar), 'CP1252', '')
#  81 8d 8f 90 9d
Not sure what can be done about those. With Latin-1, they would
correspond to unprintable control characters anyway.
More information about the R-help