[R] Converting a whole dataframe (including attributes) from latin1 to UTF-8

Hans Ekbrand hans.ekbrand at sociology.gu.se
Wed Apr 8 01:11:17 CEST 2009


Hi list!

Short version: How do I convert a whole data.frame from latin1
encoding to utf8?

I get SPSS files with latin1 encoding. My OS is GNU/Linux and the
locale sv_SE.utf8, and I normally interface R with Emacs/ESS. I have
used the following hack to convert a data.frame in latin1 to utf8:

> Sys.setlocale(category = "LC_ALL", locale = "sv_SE.iso88591")
> foo <- read.spss("foo.sav", to.data.frame=TRUE)
> write.table(foo, "foo.data")
$ recode lat1..utf8 foo.data
> Sys.setlocale(category = "LC_ALL", locale = "sv_SE.utf8")
> foo <- read.table("foo.data")

I have now found two problems with this approach: 

a) variable.labels is droped
b) the order of unordered factors is changed

I had just worked out a hack for a) when I realised b). b) is a
problem when the factors really is ordered, but not recognized as such
by read.spss (and/or not defined as such in SPSS, but since SPSS
respects the numeric values of the factors anyway, users don't need
to)

Rather than hack around b) too, I wonder if anyone on the list know
how to convert a whole data.frame from latin1 encoding to utf8?

TIA

-- 
Hans Ekbrand (http://sociologi.cjb.net) <hans at sociologi.cjb.net>
A. Because it breaks the logical sequence of discussion
Q. Why is top posting bad?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090408/41538cee/attachment-0002.bin>


More information about the R-help mailing list