[R] R on Windows crashes when source'ing UTF-8 file

Kenn Konstabel lebatsnok at gmail.com
Thu Jul 10 14:18:52 CEST 2014


Dear all,

I found an unexpected behaviour when trying to `source` an utf-8 file
on windows 7:

source("http://psych.ut.ee/~nek/R/test-utf8.txt")

# Rgui.exe reacts:
# R for windows GUI has stopped working. A problem caused the program
to stop working correctly.
# Windows will close the program and notify you if a solution is available.

The same will happen with R.exe ("terminal") and R running wihin
Rstudio. (Session and locale info below).

However, a non-utf version of this little script can be `source`d
without problems.

source("http://psych.ut.ee/~nek/R/test.txt")

Adding the `encoding` argument to `source` helps a little:

source("http://psych.ut.ee/~nek/R/test-utf8.txt", encoding="utf-8")
#  unsure about the spelling of utf-8 so I also tried UTF8, utf8, and UTF-8
# ... with the same result in all cases

R doesn't crash any more but gives the following error:

# Error in source("http://psych.ut.ee/~nek/R/test-utf8.txt", encoding
= "utf-8") :
#   http://psych.ut.ee/~nek/R/test-utf8.txt:2:0: unexpected end of input
# 1: ?
#    ^
# In addition: Warning message:
# In readLines(file, warn = FALSE) :
#  invalid input found on input connection
'http://psych.ut.ee/~nek/R/test-utf8.txt'

I thought maybe that's because what notepad told me is UTF-8 is
actually something else ... so I did two more experiments.

source("http://psych.ut.ee/~nek/R/test2.R")
# this was created on a linux machine with leafpad, and saved as utf-8 text
# it can be source´d on windows

source("http://psych.ut.ee/~nek/R/test3.R")
# the same as previous but o's in file were replaced by ö's
# can be source'd on windows but the "ö" character is shown as ƶ
# except if you add encoding="utf-8" - then, as expected, it works as expected

So in sum, I can create "plain text" (saved with utf-8 encoding) files
on windows that cannot be sourced to R on windows, or will crash R
(depending on how you source them). The same files can be sourced on
linux without problems. Part of the problem is obviously in windows
but maybe R shouldn't at least crash.

Session info:

 R version 3.0.2 (2013-09-25)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=Estonian_Estonia.1257  LC_CTYPE=Estonian_Estonia.1257
[3] LC_MONETARY=Estonian_Estonia.1257 LC_NUMERIC=C
[5] LC_TIME=Estonian_Estonia.1257

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_3.0.2


OS: Windows 7

Linux Mint Debian Edition and R 3.0.2 on the other machine (where
everything worked).

Context:

I was trying to find out how to make files that could be source'd on
both windows and linux. This is partly solved so I have no specific
question other than "is this a bug in windows version?" but any
comments on the general topic would be appreciated too.

Best regards,

Kenn


Kenn Konstabel
Research fellow
Department of chronic diseases
National Institute of Health Development
Hiiu 42
Tallinn
Estonia



More information about the R-help mailing list