[Rd] Support writing UTF-8 output in Windows

Philippe Grosjean phgrosjean at sciviews.org
Sun Nov 10 09:56:51 CET 2013


To continue this discussion with constructive propositions, here is a page that provides useful tracks for using UTF-8 on Windows: http://www.utf8everywhere.org (second half of the page).

On 10 Nov 2013, at 00:58, Ben Bolker <bbolker at gmail.com> wrote:

> Duncan Murdoch <murdoch.duncan <at> gmail.com> writes:
> 
>> 
>> On 13-11-09 12:07 PM, Sverre Stausland wrote:
>>> As recently discussed on Stack Overflow, R for Mac OS and Ubuntu (so
>>> probably all Unix systems) can correctly write files with UTF-8
>>> encoding, but R for Windows cannot:
>> 
>> That's not an accurate description of the problem.  Some functions in R 
>> convert values to the native encoding, but not all do.
>> 
>>> http://stackoverflow.com/questions/19877676/write-utf-8-files-from-r
>>> 
>>> I strongly suggest that R for Windows should support this feature in
>>> upcoming versions.
>> 
>> It's not trivial to do.  When R was written, and perhaps still on some 
>> obscure platforms, there wasn't any way to do that--Windows didn't 
>> support UTF-8 then, just Microsoft's version of UCS-2 and a variety of 
>> other more limited encodings.  Unix platforms didn't support UCS-2.  So 
>> internally R keeps many things in the native encoding.
>> 
>> If you decide to rewrite R from scratch now, I'd suggest that you handle 
>> things differently.  If you'd rather not rewrite it yourself, then I 
>> don't know how you will convince someone else to take on that job.
>> 
>> You might find it easier to convince Microsoft to add a UTF-8 locale, so 
>> then the native encoding would be UTF-8, and the problem would go away.
>> 
>> Duncan Murdoch

I can easily understand this is a huge work… and that R Core Team is not ready to endorse it (now? alone?). However, I am not happy at all to read such kind of sarkastic answer from someone belonging to the R Core Team. This is a serious problem in R under Windows and it would make life a lot more easier to everyone if R could be UTF-8 compliant an *all* supported platforms/OSes.

Philippe Grosjean 


> 
>  Would it be fairer / more productive to say/ask: 
> 
> * it would be nice if write.table could write files in UTF-8 encoding
> * is there any documentation already available about which R functions
> _do_ handle UTF-8 output on Windows, and how they do it?  
> * could they be used as models for adapting write.table to write files
> in UTF-8 encoding on Windows?
> 
>  i.e., instead of "convert R to output UTF-8 universally on Windows",
> "figure out how to make write.table output UTF-8 on Windows, or
> suggest a workaround" ?
> 
>  Ben Bolker
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 



More information about the R-devel mailing list