[R] on specifying an encoding for plot's main-argument

Daniel Bastos dbastos at toledo.com
Mon Feb 1 20:56:01 CET 2016


Duncan Murdoch <murdoch.duncan at gmail.com> writes:

> On 29/01/2016 10:35 AM, Daniel Bastos wrote:
>> Here's how I plot a graph.
>>
>>    plot(c(1,2,3), main = "graph ç")
>>
>> The main-string has a UTF-8 character "ç".  I believe I'm using the
>> windows device.  It opens up on my screen.  (The window says ``R
>> Graphics: Device 2 (ACTIVE)''.)  How can I tell it to use my encoding of
>> choice?
>
> As far as I know that's impossible.  R uses the system encoding, and I
> don't think any Windows versions use UTF-8 code pages.  They use
> UTF-16 for wide characters, and some 8 bit encoding for byte-sized
> characters. R will use whatever 8 bit code page Windows chooses.

You seem to be correct.  Here's what Microsoft has to say.  ``[...]
UTF-16 [...] is the most common encoding of Unicode and the one used for
native Unicode encoding on Windows operating systems.''[1] 

They also claim that ``[w]hile Unicode-enabled functions in Windows use
UTF-16, it is also possible to work with data encoded in UTF-8 or UTF-7,
which are supported in Windows as multibyte character set code
pages.''[1]

But I couldn't verify the claim.

The documentation of setlocale[2] says the ``set of available locale
names, languages, country/region codes, and code pages includes all
those supported by the Windows NLS API except code pages that require
more than two bytes per character, such as UTF-7 and UTF-8. If you
provide a code page value of UTF-7 or UTF-8, setlocale will fail,
returning NULL.''[2]

That seems to be correct as per the following C code.

  printf("locale: %s\n", setlocale(LC_ALL, "UTF-8"));

And [3] makes me think that _wsetlocale behaves the same way:
``_wsetlocale [...] is a wide-character version of setlocale; the
arguments and return values of _wsetlocale are wide-character strings.''
The following program seems to confirm it.

int main(int argc, char *argv[]) {
  printf("locale: %s\n", _wsetlocale(LC_ALL, (const wchar_t *) "UTF-8"));
  return 0;
}

[...]

(*) A workaround

Since R comes with iconv(), the following might be a safe way to
translate UTF-8 into the current system locale, displaying correctly
plot's titles on Windows systems.

  iconv("utf8-string", from="UTF-8", 
     to=localeToCharset(Sys.getlocale("LC_CTYPE")))

(*) References

[1] MSDN Unicode
https://msdn.microsoft.com/en-us/library/windows/desktop/dd374081(v=vs.85).aspx

[2] MSDN setlocale
https://msdn.microsoft.com/en-us/library/x99tb11d.aspx

[3] MSDN Locales and Code Pages
https://msdn.microsoft.com/en-us/library/8w60z792.aspx



More information about the R-help mailing list