[Rd] [RFC] A case for freezing CRAN

Hervé Pagès hpages at fhcrc.org
Wed Mar 19 22:00:23 CET 2014


On 03/19/2014 07:00 AM, Kasper Daniel Hansen wrote:
> Our experience in Bioconductor is that this is a pretty hard problem.

What's hard and requires a substantial amount of human resources is to
run our build system (set up the build machines, keep up with changes
in R, babysit the builds, assist developers with build issues, etc...)

But *freezing* the CRAN packages for each version of R is *very* easy
to do. The CRAN maintainers already do it for the binary packages.
What could be the reason for not doing it for source packages too?
Maybe in prehistoric times there was this belief that a source package
was aimed to remain compatible with all versions of R, present and
future, but that dream is dead and gone...

Right now the layout of the CRAN package repo is:

   ├── src
   │   └── contrib
   └── bin
       ├── windows
       │   └── contrib
       │       ├ ...
       │       ├ 3.0
       │       ├ 3.1
       │       ├ ...
       └── macosx
           └── contrib
               ├ ...
               ├ 3.0
               ├ 3.1
               ├ ...

when it could be:

   ├── 3.0
   │   ├── src
   │   │   └── contrib
   │   └── bin
   │       ├── windows
   │       │   └── contrib
   │       └── macosx
   │           └── contrib
   ├── 3.1
   │   ├── src
   │   │   └── contrib
   │   └── bin
   │       ├── windows
   │       │   └── contrib
   │       └── macosx
   │           └── contrib
   ├── ...

That is: the split by version is done at the top, not at the bottom.

It doesn't use more disk space than the current layout (you can just
throw the src/contrib/Archive/ folder away, there is no more need
for it).

install.packages() and family would need to be modified a little bit
to work with this new layout. And that's all!

The never ending changes in Mac OS X binary formats can be handled
in a cleaner way i.e. no more symlinks under bin/macosx to keep
backward compatibility with different binary formats and with old
versions of install.packages().

Then in 10 years from now, you can reproduce an analysis that you
did today with R-3.0. Because when you'll install R-3.0 and the
packages required for this analysis, you'll end up with exactly
the same packages as today.


> What the OP presumably wants is some guarantee that all packages on CRAN
> work well together.  A good example is when Rcpp was updated, it broke
> other packages (quick note: The Rcpp developers do a incredible amount of
> work to deal with this; it is almost impossible to not have a few days of
> chaos).  Ensuring this is not a trivial task, and it requires some buy-in
> both from the "repository" and from the developers.
> For Bioconductor it is even harder as the dependency graph of Bioconductor
> is much more involved than the one for CRAN, where most packages depends
> only on a few other packages.  This is why we need to do this for Bioc.
> Based on my experience with CRAN I am not sure I see a need for a
> coordinated release (or rather, I can sympathize with the need, but I don't
> think the effort is worth it).
> What would be more useful in terms of reproducibility is the capability of
> installing a specific version of a package from a repository using
> install.packages(), which would require archiving older versions in a
> coordinated fashion. I know CRAN archives old versions, but I am not aware
> if we can programmatically query the repository about this.
> Best,
> Kasper
> On Wed, Mar 19, 2014 at 8:52 AM, Joshua Ulrich <josh.m.ulrich at gmail.com>wrote:
>> On Tue, Mar 18, 2014 at 3:24 PM, Jeroen Ooms <jeroen.ooms at stat.ucla.edu>
>> wrote:
>> <snip>
>>> ## Summary
>>> Extending the r-release cycle to CRAN seems like a solution that would
>>> be easy to implement. Package updates simply only get pushed to the
>>> r-devel branches of cran, rather than r-release and r-release-old.
>>> This separates development from production/use in a way that is common
>>> sense in most open source communities. Benefits for R include:
>> Nothing is ever as simple as it seems (especially from the perspective
>> of one who won't be doing the work).
>> There is nothing preventing you (or anyone else) from creating
>> repositories that do what you suggest.  Create a CRAN mirror (or more
>> than one) that only include the package versions you think they
>> should.  Then have your production servers use it (them) instead of
>> CRAN.
>> Better yet, make those repositories public.  If many people like your
>> idea, they will use your new repositories instead of CRAN.  There is
>> no reason to impose this change on all world-wide CRAN users.
>> Best,
>> --
>> Joshua Ulrich  |  about.me/joshuaulrich
>> FOSS Trading  |  www.fosstrading.com
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 	[[alternative HTML version deleted]]
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

More information about the R-devel mailing list