[R] read.spss, locale and encodings

Peter Dalgaard p.dalgaard at biostat.ku.dk
Wed Apr 8 16:17:51 CEST 2009


Hans Ekbrand wrote:
> On Wed, Apr 08, 2009 at 03:03:06PM +0200, Peter Dalgaard wrote:
>> Hans Ekbrand wrote:
>>> I must be missing something obvious here:
>>>
>>> According to the help page for read.spss, the reencode option is only
>>> active when R is run under a UTF-8 locale.
>> Not in my version:
>>
>> reencode: logical: should character strings be re-encoded to the
>>           current locale.  The default, 'NA', means to do so in a UTF-8
>>           locale, only.  Alternatively character, specifying an
>>           encoding to assume.
> 
> OK, thanks for that correction, but the problem isn't solved, since
> read.spss fails, see below. When read.spss succeeds, the options is
> not useful, since then the current locale is iso88591(5).
> 
>> So, does it help with reencode="Latin1"? Presumably this comes from  
>> assuming UTF-8 when it isn't.
> 
>> Sys.getlocale()
> [1] "LC_CTYPE=sv_SE.UTF-8;LC_NUMERIC=C;LC_TIME=sv_SE.UTF-8;LC_COLLATE=sv_SE.UTF-8;LC_MONETARY=sv_SE.UTF-8;LC_MESSAGES=sv_SE.utf8;LC_PAPER=sv_SE.utf8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=sv_SE.utf8;LC_IDENTIFICATION=C"
>> test <- read.spss("wo.sav", to.data.frame=TRUE, reencode="Latin1")
> Error in read.spss("wo.sav", to.data.frame = TRUE, reencode = "Latin1") : 
>   error reading system-file header
> In addition: Warning message:
> In read.spss("wo.sav", to.data.frame = TRUE, reencode = "Latin1") :
>   wo.sav: position 143: Variable name begins with invalid character
> 
> Using another version of the dataset, where I have successfully
> encoded the names to UTF-8, here is the problematic variable name:
> 
> names(Workorientation.2005.Swe)[143]
> [1] "KÖN1"
> 
>> 8.34 is used in the current prerelease. AFAIR, some issues with
>> encodings were fixed recently.
> 
> Someone running foreign 8.34 that is willing to test my SPSS-file?

Someone with an SPSS file problem willing to help test the prereleases? :-)

You could start by placing it somewhere accessible...

-- 
    O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
  (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907




More information about the R-help mailing list