[Rd] possible internal (un)tar bug

Martin Maechler m@echler @ending from @t@t@m@th@ethz@ch
Tue May 1 18:45:00 CEST 2018


TLDR:  Use  gzfile(), not file()  .. and you have no problems.

>>>>> Martin Maechler <maechler at stat.math.ethz.ch>
>>>>>     on Tue, 1 May 2018 16:39:57 +0200 writes:

>>>>> Martin Maechler <maechler at stat.math.ethz.ch>
>>>>>     on Tue, 1 May 2018 16:14:43 +0200 writes:

>>>>> Gábor Csárdi <csardi.gabor at gmail.com>
>>>>>     on Tue, 1 May 2018 12:05:32 +0000 writes:

    >>> This is a not too old R-devel on Linux, it already fails
    >>> in R 3.4.4, and on macOS as well.

    >> and fails in considerably older R versions, too.

    >> Basically untar() seems to fail on a connection, but works
    >> fine on a plain file name.

    > Well, there's an easy workaround:   If you want to use a
    > connection (instead of a simple filename) with  untar() and want
    > to use compression (as in the example), you
    > can currently  do that easily when you ensure the connection is
    > a "gzcon" one :

    > ##=========>  Workaround for now:

    > ## Create :
    > setwd(tempdir()) ; dir.create("pkg")
    > cat("this: that\n", file = file.path("pkg", "DESCRIPTION"))
    > tf <- "pkg_1.0.tar.gz"
    > tar(tf, "pkg", compression = "gzip", tar = "internal")
    > unlink("pkg", recursive = TRUE)

    > ## As it is a compressed tar file, use it via a gzcon() connection,
    > ## and both cases work fine:
    > con <- gzcon(file(tf, open = "rb")) ; (f <- untar(con, list = TRUE))
    > ##     ~~~~~
    > con <- gzcon(file(tf, open = "rb")) ; untar(con, files = f)
    > stopifnot(identical(f, "pkg/DESCRIPTION"),
    > file.exists(f))
    > unlink(c(tf,"pkg"), recursive = TRUE) # clean after me

Actually, much better than  gzcon(file(....))  is  gzfile(....)
The latter works for all compression types that are supported by
tar(), not just for  gzip compression.

In the end, I'd conclude for now that the bug is mostly in the
documentation and the unhelpful error message.

We could try to "fix" your use case by wrapping the connection
by  gzcon(.) and that is okay also for uncompressed tar
files. However it fails for the newer compression schemes which
are all supported via gzfile().

I propose to commit the following change :

1) change the documentation of untar() to say that a connection
   to a compressed tar file should be created by gzfile().
2) in the case of a connection which gave the "block error",
   the error would newly be more helpful, mentioning gzfile().

Currently:

> con <- file(tf, open = "rb"); try( untar(con, list = TRUE) ) ## -> Error
Error in untar2(tarfile, files, list, exdir, restore_times) : 
  incomplete block: rather use gzfile(.) created connection?
> 

Feedback (by anyone)  ??

Martin




More information about the R-devel mailing list