[Rd] re ad.spss (foreign) conflict with SPSS 17 files.

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Dec 15 07:51:34 CET 2008


See the help page.   We haven't been told but it looks like the Debian 
system is in a UTF-8 locale: reencode=FALSE is likely appropriate there.
However, the posting guide does ask for the output of sessionInof() for a 
good reason.

Yes, it looks like 65001 is UTF-8, but we don't know for certain.  I am 
planning on assuming so for the next release of foreign, which will follow 
R 2.8.1 early in the next year.

I think the title is rather off: this is more what read.spss does about 
undocumented features of SPSS formats (and record type 7, subtype 20 is 
another such feature).

On Mon, 15 Dec 2008, Peter Dalgaard wrote:

> Jeroen Ooms wrote:
>> SPSS seems to have changed its default datafile format, resulting in issues
>> for read.spss(). In Windows this results in a warning, in Debian the import
>> completely fails:
>> 
>> Debian (R version 2.8.0 (2008-10-20) i486-pc-linux-gnu, foreign_0.8-29)
>> 
>>> read.spss("/home/jeroen/samples/Tomato.sav")
>> Error in iconv(names(rval), cp, "") :
>>   unsupported conversion from 'CP65001' to ''
>> In addition: Warning messages:
>> 1: In read.spss("/home/jeroen/samples/Tomato.sav") :
>>   /home/jeroen/samples/Tomato.sav: File-indicated character representation
>> code (65001) looks like a Windows codepage
>> 2: In read.spss("/home/jeroen/samples/Tomato.sav") :
>>   /home/jeroen/samples/Tomato.sav: Unrecognized record type 7, subtype 20
>> encountered in system file
>> 
>> 
>> windows (R version 2.8.0 (2008-10-20), foreign_0.8-29)
>> 
>>> read.spss("C:/Program
>>> Files/SPSSInc/Statistics17/Samples/English/Tomato.sav")
>> 
>> ...
>>  attr(,"codepage")
>> [1] 65001
>> 
>> Warning messages:
>> 1: In read.spss("C:/Program
>> Files/SPSSInc/Statistics17/Samples/English/Tomato.sav") :
>>   C:/Program Files/SPSSInc/Statistics17/Samples/English/Tomato.sav:
>> File-indicated character representation code (65001) looks like a Windows
>> codepage
>> 2: In read.spss("C:/Program
>> Files/SPSSInc/Statistics17/Samples/English/Tomato.sav") :
>>   C:/Program Files/SPSSInc/Statistics17/Samples/English/Tomato.sav:
>> Unrecognized record type 7, subtype 20 encountered in system file
>> 
>> 
>> I've share some sample datafiles that are included with SPSS, so you can
>> take a look: http://jeroen.xlshosting.net/samples/
>> I hope there is a fix, I think importing data from SPSS is a very popular
>> feature.

We do prefer people to export from SPSS in a documented format.

>> Thank you!
>
>
> Thanks,
>
> It looks like adding reencode="utf8" removes the iconv message. The warnings 
> appear to be harmless.
>
> In fact, reencode="ascii" works for me as well on the Tomato.sav file. 
> However as far as I can google, Code Page 65001 _is_ UTF-8...
>
> --
>   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
>  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
> (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


More information about the R-devel mailing list