[R] read.spss, locale and encodings

Hans Ekbrand hans.ekbrand at sociology.gu.se
Wed Apr 8 15:34:53 CEST 2009


On Wed, Apr 08, 2009 at 03:03:06PM +0200, Peter Dalgaard wrote:
> Hans Ekbrand wrote:
>> I must be missing something obvious here:
>>
>> According to the help page for read.spss, the reencode option is only
>> active when R is run under a UTF-8 locale.
>
> Not in my version:
>
> reencode: logical: should character strings be re-encoded to the
>           current locale.  The default, 'NA', means to do so in a UTF-8
>           locale, only.  Alternatively character, specifying an
>           encoding to assume.

OK, thanks for that correction, but the problem isn't solved, since
read.spss fails, see below. When read.spss succeeds, the options is
not useful, since then the current locale is iso88591(5).

> So, does it help with reencode="Latin1"? Presumably this comes from  
> assuming UTF-8 when it isn't.

> Sys.getlocale()
[1] "LC_CTYPE=sv_SE.UTF-8;LC_NUMERIC=C;LC_TIME=sv_SE.UTF-8;LC_COLLATE=sv_SE.UTF-8;LC_MONETARY=sv_SE.UTF-8;LC_MESSAGES=sv_SE.utf8;LC_PAPER=sv_SE.utf8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=sv_SE.utf8;LC_IDENTIFICATION=C"
> test <- read.spss("wo.sav", to.data.frame=TRUE, reencode="Latin1")
Error in read.spss("wo.sav", to.data.frame = TRUE, reencode = "Latin1") : 
  error reading system-file header
In addition: Warning message:
In read.spss("wo.sav", to.data.frame = TRUE, reencode = "Latin1") :
  wo.sav: position 143: Variable name begins with invalid character

Using another version of the dataset, where I have successfully
encoded the names to UTF-8, here is the problematic variable name:

names(Workorientation.2005.Swe)[143]
[1] "KÖN1"

> 8.34 is used in the current prerelease. AFAIR, some issues with
> encodings were fixed recently.

Someone running foreign 8.34 that is willing to test my SPSS-file?

-- 
Hans Ekbrand (http://sociologi.cjb.net) <hans at sociologi.cjb.net>
Q. What is that strange attachment in this mail?
A. My digital signature, see www.gnupg.org for info on how you could
 use it to ensure that this mail is from me and has not been
 altered on the way to you.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090408/bade1573/attachment-0002.bin>


More information about the R-help mailing list