[R] unz() ignores encoding argument

Stefan Evert stefanML at collocations.de
Mon Sep 20 15:39:18 CEST 2010


Hi!

I'm trying to read individual files from a ZIP archive, using the unz() function.  Some of the files contain non-ASCII characters and I'd like to avoid unpacking them in a temporary directory.

My problem is that unz() seems to ignore the encoding="latin1" option I need to read the non-ASCII characters properly.  I can't find a clear indication in the documentation that this is expected behaviour, except for the remark that "unz reads (only) single files within zip files, in binary mode" (and a short comment further below that re-encoding only works for text connections).

Digging a bit in the source code, the ultimate cause seems to be this line in the unz_open() C-level function, on line 359 of src/main/dounzip.c:

>     /* set_iconv(); not yet */

Any ideas why this is commented out?  The previous lines set up con->text appropriately and con->encname was set by do_unz(), so I don't see an obvious reason why the iconv layer can't be added.

I'm working on 2.11.1 

>                _                            
> platform       i386-apple-darwin9.8.0       
> arch           i386                         
> os             darwin9.8.0                  
> system         i386, darwin9.8.0            
> status                                      
> major          2                            
> minor          11.1                         
> year           2010                         
> month          05                           
> day            31                           
> svn rev        52157                        
> language       R                            
> version.string R version 2.11.1 (2010-05-31)

but have been looking at the current R-devel source code, so I suspect my problem won't just go away with the next release.


Best regards,
Stefan Evert

[ stefan.evert at uos.de | http://purl.org/stefan.evert ]



More information about the R-help mailing list