[Rd] R CMD build --resave-data

Martin Maechler maechler at stat.math.ethz.ch
Wed Apr 13 14:45:37 CEST 2011


>>>>> Hervé Pagès <hpages at fhcrc.org>
>>>>>     on Tue, 12 Apr 2011 22:21:58 -0700 writes:

    > On 11-04-12 07:06 PM, Simon Urbanek wrote:
    >> 
    >> On Apr 12, 2011, at 8:53 PM, Hervé Pagès wrote:
    >> 
    >>> Hi Uwe,
    >>> 
    >>> On 11-04-11 08:13 AM, Uwe Ligges wrote:
    >>>> 
    >>>> 
    >>>> On 11.04.2011 02:47, Hervé Pagès wrote:
    >>>>> Hi,
    >>>>> 
    >>>>> More about the new --resave-data option
    >>>>> 
    >>>>> As mentioned previously here
    >>>>> 
    >>>>> https://stat.ethz.ch/pipermail/r-devel/2011-April/060511.html
    >>>>> 
    >>>>> 'R CMD build' and 'R CMD INSTALL' handle this new option
    >>>>> inconsistently. The former does --resave-data="gzip" by
    >>>>> default.  The latter doesn't seem to support the
    >>>>> --resave-data= syntax: the --resave-data flag must either be
    >>>>> present or not. And by default 'R CMD INSTALL' won't resave
    >>>>> the data.
    >>>>> 
    >>>>> Also, because now 'R CMD build' is resaving the data,
    >>>>> shouldn't it reinstall the package in order to be able to do
    >>>>> this correctly?
    >>>>> 
    >>>>> Here is why. There is this new warning in 'R CMD check' that
    >>>>> complains about files not of a type allowed in a 'data'
    >>>>> directory:
    >>>>> 
    >>>>> 
    >>>>> http://bioconductor.org/checkResults/2.8/bioc-LATEST/Icens/lamb1-checksrc.html
    >>>>> 
    >>>>> 
    >>>>> 
    >>>>> The Icens package also has .R files under data/ with things
    >>>>> like:
    >>>>> 
    >>>>> bet<- matrix(scan("CMVdata", quiet=TRUE),nc=5,byr=TRUE)
    >>>>> 
    >>>>> i.e. the R code needs to access some of the text files
    >>>>> located in the data/ folder. So in order to get rid of this
    >>>>> warning I tried to move those text files to inst/extdata/
    >>>>> and I modified the code in the .R file so it does:
    >>>>> 
    >>>>> CMVdata_filepath<- system.file("extdata", "CMVdata",
    >>>>> package="Icens") bet<- matrix(scan(CMVdata_filepath,
    >>>>> quiet=TRUE),nc=5,byr=TRUE)
    >>>>> 
    >>>>> But now 'R CMD build' fails to resave the data because the
    >>>>> package was not installed first and the CMVdata file could
    >>>>> not be found.
    >>>>> 
    >>>>> Unfortunately, for a lot of people that means that the safe
    >>>>> way to build a source tarball now is with
    >>>>> 
    >>>>> R CMD build --keep-empty-dirs --no-resave-data
    >>>> 
    >>>> 
    >>>> Hervé,
    >>>> 
    >>>> actually is makes some sense to have these defaults from a
    >>>> CRAN maintainer's point of view:
    >>>> 
    >>>> --keep-empty-dirs: we found many packages containing empty
    >>>> dirs unnecessarily and the idea is to exclude them at the
    >>>> build state rather than at the later installation stage. Note
    >>>> that the package maintainer is supposed to run build (and
    >>>> knows if the empty dirs are to be included, the user who runs
    >>>> INSTALL does not).
    >>>> 
    >>>> --no-resave-data: We found many packages with unsufficiently
    >>>> compressed data. This should be fixed when building the
    >>>> package, not later when installing it, since the reduces size
    >>>> is useful in the source tarball already.
    >>>> 
    >>>> So it does make some sense to have different defaults in
    >>>> build as opposed to INSTALL from my point of view (although I
    >>>> could live with different, tough).
    >>> 
    >>> If you deliberately ignore the fact that 'R CMD INSTALL' is
    >>> also used by developers to install from the *package source
    >>> tree* (by opposition to end users who use it to install from a
    >>> *source tarball*,
    >> 
    >> .. for a good reason, IMHO no serious developer would do that
    >> for obvious reasons -

    > This sounds like saying that no serious developer working on a
    > big project involving a lot of files to compile should use
    > 'make'.  I mean, serious developers like you *always* do 'make
    > clean' before they do 'make' on the R tree when they need to
    > test a change, even a small one? And this only takes a "fraction
    > of second" for them?  Hey, I'd love to be able to do that too!
    > ;-)

    > H.

    >> you'd be working on a dirty copy creating many unnecessary
    >> problems and polluting your sources. The first time you'll
    >> spend an hour chasing a non-existent problem due to stale
    >> binary objects in your tree you'll learn that lesson ;). The
    >> fraction of a second spent in R CMD build is well worth the
    >> hours saved. IMHO the only valid reason to run INSTALL on a
    >> (freshly unpacked tar ball) directory is to capture config.log.
    >> 
    >> Cheers, Simon
    >> 
    >> 
    >> 
    >>> even though they don't use it directly), then you have a
    >>> point. So maybe I should have been more explicit about the
    >>> problem that it can be for the *developer* to have 'R CMD
    >>> build' and 'R CMD INSTALL' behave differently by default.
    >>> 
    >>> Of course I'm not suggesting that 'R CMD INSTALL' should
    >>> behave differently (by default) depending on whether it's used
    >>> on a source tarball (mode 1) or a package source tree (mode
    >>> 2).
    >>> 
    >>> I'm suggesting that, by default, the 3 commands (R CMD build +
    >>> R CMD INSTALL in mode 1 and 2) behave consistently.
    >>> 
    >>> With the latest changes, and by default, 'R CMD INSTALL' is
    >>> still doing the right thing, but not 'R CMD build' anymore.
    >>> 
    >>> I perfectly understand the intention behind those new flags,
    >>> which is to try to "optimize" the resulting source tarball but
    >>> what would you think if 'gcc' had some optimization flags that
    >>> can generate broken executables (under some circumstances) and
    >>> if these flags were enabled by default?
    >>> 
    >>> Note that I would have no problem with 'R CMD build' trying to
    >>> resave the data by default if the current implementation of
    >>> that feature was working properly, but unfortunately it's
    >>> broken (see my previous email for the details).
    >>> 
    >>> Thanks, H.
    >>> 
    >>>> 
    >>>> If you need further arguments for the discussion: I also tend to use
    >>>> --no-vignettes nowadays if my code does not change considerably. ;-)
    >>>> 
    >>>> Best wishes,
    >>>> Uwe
    >>>> 
    >>>> 
    >>>> 
    >>>>> I hope the list of options/flags that we need to use to "fix" 'R CMD
    >>>>> build' (and make it consistent with R CMD INSTALL) is not going to
    >>>>> grow too much ;-)
;-)

I'm with Herve here.
I almost always use  R CMD INSTALL on a directory rather than a
tarball... though most of the time the directory is freshly
untarred.
Other times, however one of the reasons is exactly that I can
keep things around (*.o, ...) which are only rebuilt very
rarely.

Martin



More information about the R-devel mailing list