[R] Please guide -- UTF-8 locale setting fails on Windows on writing

Milan Bouchet-Valat nalimilan at club.fr
Mon Mar 28 18:28:38 CEST 2016


Le lundi 28 mars 2016 à 20:12 +0530, Sunny Singha a écrit :
> Milan,
> Ok, Let me take a case of facebook. I used Rfacebook package
>  to get posts (getPost()) which returns list() of data frames(post,
> comments, Likes)
> 
> let me demonstrate 2 cases of read and write just as you suggested,
> Case 1:::::::::
> Lets say one of the facebook comment has below string value, in
> Japanese language-->
> "世界餐福事工 - 餐廳員工沒精打采 老是打盤子"
> 
> On R console I now assign above string to variableas: x <- "世界餐福事工 -
> 餐廳員工沒精打采 老是打盤子"
> and write it as below:
> write.csv(x, file='x.csv', row.names=F, fileEncoding='UTF-8')
> I get this string in the file
> "" -
>  "
But how do you read back the contents of the file? You need to specify
the encoding when reading it too.

> Case 2::::::::::::::
> I create a notepad 'x.txt' and save Japanese string "世界餐福事工 - 餐廳員工沒精打采 老是打盤子"
> and read it as below:
> read.table('x.txt', fileEncoding='UTF-8'), I get below output:
> 
>   V1
> 1  ?
> Warning messages:
> 1: In read.table("x.txt", fileEncoding = "UTF-8") :
>   invalid input found on input connection 'x.txt'
> 2: In read.table("x.txt", fileEncoding = "UTF-8") :
>   incomplete final line found by readTableHeader on 'x.txt'
Are you sure the notepad saved the text as UTF-8?

> Above was for demonstration, I'm infact reading social media data
> extracted, which ultimately is somewhere using httr package and
> returning data frames.
> I'm not sure how should I get it handled in Windows as I don't observe
> this behavior in Mac where system locase is set to 'en_US.UTF-8'
> 
> Regards,
> Sunny
> 
> 
> 
> 
> On Mon, Mar 28, 2016 at 7:39 PM, Milan Bouchet-Valat  wrote:
> > 
> > Le lundi 28 mars 2016 à 19:16 +0530, Sunny Singha a écrit :
> > > 
> > > Hi,
> > > I think I'm experiencing an issue regarding system Locale. I have
> > > exported '.csv' formatted data frames gathered from various social
> > > media platforms like facebook/twitter/G+, etc.
> > > 
> > > I observe many variable/columns consists of strings formatted similar to below:
> > > "
> > > "
> > > 
> > > As expected and I confirmed, in social media data, they are strings in
> > > different languages.
> > > Platform details are provide in the end of this mail. OS locale is set
> > > to English (United States) hence 'R' locale is 'English_United
> > > States.1252'
> > > 
> > > I have attempted to change it to UTF-8 but receives below warning message:
> > > 
> > > Warning message:
> > > In Sys.setlocale("LC_ALL", "UTF-8") :
> > >   OS reports request to set locale to "UTF-8" cannot be honored
> > You don't need to set the locale. Just pass an appropriate value (e.g.
> > "UTF-8") to read.csv() or write.csv()'s fileEncoding argument.
> > 
> > You also didn't tell us what program you used to read these files. Some
> > might guess the encoding incorrectly, or require you to choose it
> > manually.
> > 
> > 
> > Regards
> > 
> > > 
> > > I have gone through below forums but no resolution so far:
> > > --- http://stackoverflow.com/questions/20571147/how-to-set-unicode-locale-in-r
> > > --- https://stat.ethz.ch/pipermail/r-devel/2013-November/067940.html
> > > --- http://stackoverflow.com/questions/19877676/write-utf-8-files-from-r
> > > --- https://tomizonor.wordpress.com/2013/04/17/file-utf8-windows/
> > > --- http://withr.me/configure-character-encoding-for-r-under-linux-and-windows/
> > > 
> > > I'm not sure whether the issue is while reading/extracting the data
> > > from media or while writing/exporting in Windows directory, but I
> > > don't experience similar issue in my personal Mac machine. I need some
> > > clarification here.
> > > 
> > > How could I export the data just as I see on web ?  Please guide.
> > > 
> > > Regards,
> > > Sunny
> > > 
> > > Platform I'm using::::::::::::::::::::::::::::
> > > Operating System : Windows 7 Professional SP1
> > > R version details:
> > > platform       x86_64-w64-mingw32
> > > arch           x86_64
> > > os             mingw32
> > > system         x86_64, mingw32
> > > status
> > > major          3
> > > minor          2.3
> > > year           2015
> > > month          12
> > > day            10
> > > svn rev        69752
> > > language       R
> > > version.string R version 3.2.3 (2015-12-10)
> > > nickname       Wooden Christmas-Tree
> > > 
> > > ______________________________________________
> > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list