[R] Reading Chinese Language (GB2312) Input

Prof Brian Ripley ripley at stats.ox.ac.uk
Sat Oct 27 09:12:34 CEST 2012


On 26/10/2012 18:25, jgreenb1 wrote:
> I am trying to read a csv file with Chinese language text in it. The file
> should look like this:
>
> userid,jobid,Title,companyid,industryids1
> 82497,1160,互联网产品经理,12
> 96429,658,企划经理(商业公司),24
> 14471,95,产品运营经理,25,6
> 14471,1708,产品营销高级经理,727,2
> 14471,1558,产品总监,611,4
> 14471,1777,产品总监,743,1
> 14471,1697,产品经理,725,234
> 14471,1716,度假产品总监 ,730,234
> 14471,1717,产品经理,730,5
> but when I read the data in using read.csv() it looks like this in the R
> console:

How exactly?  Did you use the fileEncoding or encoding argument (see the 
help page)?

>
>    userid jobid                Title companyid industryids1
> 1  82497  1160       »¥ÁªÍø²úÆ·¾­Àí        12           NA
> 2  96429   658 Æó»®¾­Àí£¨ÉÌÒµ¹«Ë¾£©        24           NA
> 3  14471    95         ²úÆ·ÔËÓª¾­Àí        25            6
> 4  14471  1708     ²úÆ·ÓªÏú¸ß¼¶¾­Àí       727            2
> 5  14471  1558             ²úÆ·×Ü¼à       611            4
> 6  14471  1777             ²úÆ·×Ü¼à       743            1
> 7  14471  1697             ²úÆ·¾­Àí       725          234
> 8  14471  1716        ¶È¼Ù²úÆ·×Ü¼à        730          234
> 9  14471  1717             ²úÆ·¾­Àí       730            5
> How can I read this in properly?

Using fileEncoding and encoding arguments.

>
> Session info:
>
> R version 2.14.1 (2011-12-22)
> Platform: x86_64-pc-mingw32/x64 (64-bit)
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
> States.1252
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252

However, you will most likely not be able to display it in that locale 
unless you select non-default faults: see the rw-FAQ.

> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> loaded via a namespace (and not attached):
> [1] tools_2.14.1
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Reading-Chinese-Language-GB2312-Input-tp4647581.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list