[R] R on Windows crashes when source'ing UTF-8 file
lebatsnok at gmail.com
Thu Jul 10 15:53:54 CEST 2014
Wow. Thanks a lot!
# works correctly on my Windows 7 machine
# (and without encoding argument it still crashes R)
On Thu, Jul 10, 2014 at 4:33 PM, John McKown
<john.archie.mckown at gmail.com> wrote:
> On Thu, Jul 10, 2014 at 7:18 AM, Kenn Konstabel <lebatsnok at gmail.com> wrote:
>> Dear all,
>> I found an unexpected behaviour when trying to `source` an utf-8 file
>> on windows 7:
>> # Rgui.exe reacts:
>> # R for windows GUI has stopped working. A problem caused the program
>> to stop working correctly.
>> # Windows will close the program and notify you if a solution is available.
>> The same will happen with R.exe ("terminal") and R running wihin
>> Rstudio. (Session and locale info below).
>> However, a non-utf version of this little script can be `source`d
>> without problems.
>> Adding the `encoding` argument to `source` helps a little:
>> source("http://psych.ut.ee/~nek/R/test-utf8.txt", encoding="utf-8")
>> # unsure about the spelling of utf-8 so I also tried UTF8, utf8, and UTF-8
>> # ... with the same result in all cases
>> R doesn't crash any more but gives the following error:
>> # Error in source("http://psych.ut.ee/~nek/R/test-utf8.txt", encoding
>> = "utf-8") :
>> # http://psych.ut.ee/~nek/R/test-utf8.txt:2:0: unexpected end of input
>> # 1: ?
>> # ^
>> # In addition: Warning message:
>> # In readLines(file, warn = FALSE) :
>> # invalid input found on input connection
> I just tried that. On Windows XP/Pro, R 3.1.0 didn't fail, but did
> get the error you mention later. I used "wget" to actually download
> the file mentioned (on Linux). I think that the problem _may_ be that
> the file starts with a BOM (Byte Order Mark), which is 0xef, 0xbb,
> 0xef . This is supposed to tell us that this is UTF-8.
> BOM: http://en.wikipedia.org/wiki/Byte_order_mark
> I get an identical error with R 3.1.0 on both Windows XP/Pro and Linux
> Fedora 20. The problem is that the R readLines() apparently does not
> like the leading BOM. It reads it as data. Most other Linux and
> Windows applications _do_ understand the BOM and so, when you use
> them, they work properly. And, normally, when you then save the file,
> the software does not write the BOM at the start. So it works on the
> saved version of the file.
> Being the curious sort, I decided to look at the source to R. In
> particular in ~/R/src/main/connections.c I saw where it did support
> the reading of BOMs. But there is a special way to do it! Which I
> cannot find in the documentation.
> I tried the above AND IT WORKED properly!
> I simply adore having source code.
>> I thought maybe that's because what notepad told me is UTF-8 is
>> actually something else ... so I did two more experiments.
>> # this was created on a linux machine with leafpad, and saved as utf-8 text
>> # it can be source´d on windows
>> # the same as previous but o's in file were replaced by ö's
>> # can be source'd on windows but the "ö" character is shown as Ć¶
>> # except if you add encoding="utf-8" - then, as expected, it works as expected
>> So in sum, I can create "plain text" (saved with utf-8 encoding) files
>> on windows that cannot be sourced to R on windows, or will crash R
>> (depending on how you source them). The same files can be sourced on
>> linux without problems. Part of the problem is obviously in windows
>> but maybe R shouldn't at least crash.
>> Session info:
>> R version 3.0.2 (2013-09-25)
>> Platform: i386-w64-mingw32/i386 (32-bit)
>>  LC_COLLATE=Estonian_Estonia.1257 LC_CTYPE=Estonian_Estonia.1257
>>  LC_MONETARY=Estonian_Estonia.1257 LC_NUMERIC=C
>>  LC_TIME=Estonian_Estonia.1257
>> attached base packages:
>>  stats graphics grDevices utils datasets methods base
>> loaded via a namespace (and not attached):
>>  tools_3.0.2
>> OS: Windows 7
>> Linux Mint Debian Edition and R 3.0.2 on the other machine (where
>> everything worked).
>> I was trying to find out how to make files that could be source'd on
>> both windows and linux. This is partly solved so I have no specific
>> question other than "is this a bug in windows version?" but any
>> comments on the general topic would be appreciated too.
>> Best regards,
>> Kenn Konstabel
>> Research fellow
>> Department of chronic diseases
>> National Institute of Health Development
>> Hiiu 42
> There is nothing more pleasant than traveling and meeting new people!
> Genghis Khan
> Maranatha! <><
> John McKown
More information about the R-help