[Rd] unexpected behavior of unzip with list=T and unzip=/usr/bin/unzip

Tomas Kalibera tom@@@k@liber@ @ending from gm@il@com
Tue Oct 9 16:40:39 CEST 2018


Hi Paul,

thanks for the report. Fixed in R-devel 75417.

Best
Tomas

On 07/04/2018 10:08 PM, Paul Schrimpf wrote:
> Hello,
>
> I encountered some unexpected behavior of unzip when using info-zip's unzip
> instead of R's internal program. Specifically, unzip("file.zip", list=TRUE,
> unzip=/usr/bin/unzip) produces incorrect output if the zip archive has
> filenames with spaces, and results in an error if the zip archive includes
> an archive comment or file comments.
>
> Here is some code to reproduce along with the attached files
>
> ## (mostly) expected behavior
> res.intern <- unzip("noSpaces.zip",list=TRUE)
> res.infozip <- unzip("noSpaces.zip",list=TRUE,unzip="/usr/bin/unzip")
>
> identical(res.intern,res.infozip) ## will be false, but expected from
>                                    ## documentation about dates
> identical(res.infozip$Name,res.intern$Name)     ## True
> res.infozip$Length==res.intern$Length           ## TRUE
> identical(res.infozip$Length,res.intern$Length) ## FALSE, because
>                                                  ## former numeric, later
> integer
>
> ## More problematic cases
> print(unzip("fileNameWithSpaces.zip",list=TRUE))
> print(unzip("fileNameWithSpaces.zip",list=TRUE,unzip="/usr/bin/unzip"))
>        ## read.table is used to parse output of unzip -l, and gets
>        ## confused by extra spaces
>
> print(unzip("withArchiveComment.zip",list=TRUE))
> print(unzip("withArchiveComment.zip",list=TRUE,unzip="/usr/bin/unzip"))
>        ## produces an error
>
> print(unzip("entryComments.zip",list=TRUE))
> print(unzip("entryComments.zip",list=TRUE,unzip="/usr/bin/unzip"))
>        ## produces an error
>
> Looking at the code for R's unzip, the basic problem is that it makes a
> bunch of assumptions about the format of the output of "unzip -l"  that are
> not always true and are not verified.
>
> It's unclear to me whether R's unzip should be expected to be compatible
> with all sorts of external unzip programs, so perhaps a sufficient solution
> is simply to revise the documentation (which already mentions potential
> problems  with dates and unzip, list=TRUE, and external programs).
>
> Alternatively, R's unzip function could be changed to work with info-zip
> unzip by :
> (1) add "-ql" instead of just "-l" when list=TRUE to eliminate the printing
> of comments
> (2) not use read.table to parse the output of unzip, instead to something
> like the following (which is an admittedly messy workaround)
>
>              res <- if (WINDOWS)
>                  system2(unzip, c("-ql", shQuote(zipfile)), stdout = TRUE)
>              else system2(unzip, c("-ql", shQuote(zipfile)), stdout = TRUE,
>                  env = c("TZ=UTC"))
>              dashes <- grep("--",res)
>              s <- dashes[1]+1
>              l <- dashes[2]-1
>              starts <- gregexpr("-+",res[dashes[1]])[[1]]
>              ends <- gregexpr("[[:space:]]+",res[dashes[1]])[[1]]
>              z <- data.frame(
>                  Name=sapply(res[s:l], function(x) {
>                    substr(x, starts[4], stop=nchar(x))
>                  }),
>                  Length=sapply(res[s:l], function(x) {
>                    as.numeric(substr(x, starts[1], stop=ends[1]))
>                  }),
>                  Date=sapply(res[s:l], function(x) {
>                    substr(x, starts[2], stop=ends[2])
>                  }),
>                  Time=sapply(res[s:l], function(x) {
>                    substr(x, starts[3], stop=ends[3])
>                  }),
>                  stringsAsFactors=FALSE
>              )
>              rownames(z) <- NULL
>
> I can submit a patch if this is appropriate. I'm really not sure though
> because I am new to R-devel. Also, this has the downsides of relying on the
> behavior of info-zip unzip, which might change in future versions and is
> unlikely to be the same for other external unzip programs. On the other
> hand, the current code also relies on the behavior of info-zip unzip, but
> also doesn't work in some cases.
>
> Thanks,
> Paul
>
> P.S.
>
> My sessionInfo is
>
>> sessionInfo()
> R version 3.5.1 (2018-07-02)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Arch Linux
>
> Matrix products: default
> BLAS/LAPACK: /usr/lib/libopenblas_haswellp-r0.3.1.so
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
> LC_PAPER=en_US.UTF-8       LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] devtools_1.13.5
>
> loaded via a namespace (and not attached):
> [1] compiler_3.5.1 tools_3.5.1    withr_2.1.2    memoise_1.1.0
> digest_0.6.15
>
> And unzip -v
>
> UnZip 6.00 of 20 April 2009, by Info-ZIP.  Maintained by C. Spieler.  Send
> bug reports using http://www.info-zip.org/zip-bug.html; see README for
> details.
>
> Latest sources and executables are at ftp://ftp.info-zip.org/pub/infozip/ ;
> see ftp://ftp.info-zip.org/pub/infozip/UnZip.html for other sites.
>
> Compiled with gcc 5.3.0 for Unix (Linux ELF) on Apr 17 2016.
>
> UnZip special compilation options:
>          ACORN_FTYPE_NFS
>          COPYRIGHT_CLEAN (PKZIP 0.9x unreducing method not supported)
>          SET_DIR_ATTRIB
>          SYMLINKS (symbolic links supported, if RTL and file system permit)
>          TIMESTAMP
>          UNIXBACKUP
>          USE_EF_UT_TIME
>          USE_UNSHRINK (PKZIP/Zip 1.x unshrinking method supported)
>          USE_DEFLATE64 (PKZIP 4.x Deflate64(tm) supported)
>          UNICODE_SUPPORT [wide-chars, char coding: UTF-8] (handle UTF-8
> paths)
>          LARGE_FILE_SUPPORT (large files over 2 GiB supported)
>          ZIP64_SUPPORT (archives using Zip64 for large files supported)
>          USE_BZIP2 (PKZIP 4.6+, using bzip2 lib version 1.0.6, 6-Sept-2010)
>          VMS_TEXT_CONV
>          WILD_STOP_AT_DIR
>          [decryption, version 2.11 of 05 Jan 2007]
>
> UnZip and ZipInfo environment options:
>             UNZIP:  [none]
>          UNZIPOPT:  [none]
>           ZIPINFO:  [none]
>        ZIPINFOOPT:  [none]
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list