[Rd] Bug Report: read.table with UTF-8 encoded file imports infinity symbol as Integer 8

David Byrne d@v|d@byrne222 @end|ng |rom gm@||@com
Thu Feb 7 11:17:08 CET 2019


Bug
Using read.table(file, encoding="UTF-8") to import a UTF-8 encoded
file containing the infinity symbol (' ∞ ') results in the infinity
symbol imported as the number 8. Other Unicode characters seem
unaffected, example, Zhe: ж

Expected Behavior:
The imported data.frame should represent the infinity symbol as the
expected 'Inf' so that normal mathematical operations can be processed

Stack Overflow Post:
I created a question on Stack Overflow where one other member was able
to reproduce the same issues I was having. This question can be found
at:
https://stackoverflow.com/questions/54522196/r-read-table-with-utf-8-encoded-file-reads-infinity-symbol-as-8-int

Method to Reproduce - 1:
A simple method to reproduce this issues is to use R-Studio: In the
console, type the following:
> read.table(text=" ∞", encoding="UTF-8")

The result should be a data.frame with a single value of '8'

Repeating the same with ж Results in correct expected behavior

Method to Reproduce - 2:
Create a .csv file containing the infinity and Zhe characters (I have
attached the file for convenience, hopefully it is no rejected by your
email service). Launch an interactive session using

> r --vanilla

Enter the following statement taking care to replace the
<path-to-file> with the appropriate one:

> read.table("<path-to-file>/unicode_chars.csv", sep=",", encoding="UTF-8")


This should result in a two element data.frame; the first being the
incorrect value of 8 with an additional <U+FEFF> and the second the
correct value of Zhe.

Note the additional <U+FEFF> prefixed to the front of the '8'. This
appears to be a hidden character for the purposes of letting editors
know the encoding. The following link has some explanation however, it
states this is caused by excel. The file I created was done so using
notepad and not Excel.

https://medium.freecodecamp.org/a-quick-tale-about-feff-the-invisible-character-cd25cd4630e7

System Details:
OS:
> Windows 10.0.17134 Build 17134


R Version:
> platform       x86_64-w64-mingw32
> arch           x86_64
> os             mingw32
> system         x86_64, mingw32
> status
> major          3
> minor          4.1
> year           2017
> month          06
> day            30
> svn rev        72865
> language       R
> version.string R version 3.4.1 (2017-06-30)
> nickname       Single Candle


More information about the R-devel mailing list