[Rd] [RFC] A case for freezing CRAN

Hervé Pagès hpages at fhcrc.org
Thu Mar 20 20:14:18 CET 2014


On 03/20/2014 03:52 AM, Duncan Murdoch wrote:
> On 14-03-20 2:15 AM, Dan Tenenbaum wrote:
>>
>>
>> ----- Original Message -----
>>> From: "David Winsemius" <dwinsemius at comcast.net>
>>> To: "Jeroen Ooms" <jeroen.ooms at stat.ucla.edu>
>>> Cc: "r-devel" <r-devel at r-project.org>
>>> Sent: Wednesday, March 19, 2014 11:03:32 PM
>>> Subject: Re: [Rd] [RFC] A case for freezing CRAN
>>>
>>>
>>> On Mar 19, 2014, at 7:45 PM, Jeroen Ooms wrote:
>>>
>>>> On Wed, Mar 19, 2014 at 6:55 PM, Michael Weylandt
>>>> <michael.weylandt at gmail.com> wrote:
>>>>> Reading this thread again, is it a fair summary of your position
>>>>> to say "reproducibility by default is more important than giving
>>>>> users access to the newest bug fixes and features by default?"
>>>>> It's certainly arguable, but I'm not sure I'm convinced: I'd
>>>>> imagine that the ratio of new work being done vs reproductions is
>>>>> rather high and the current setup optimizes for that already.
>>>>
>>>> I think that separating development from released branches can give
>>>> us
>>>> both reliability/reproducibility (stable branch) as well as new
>>>> features (unstable branch). The user gets to pick (and you can pick
>>>> both!). The same is true for r-base: when using a 'released'
>>>> version
>>>> you get 'stable' base packages that are up to 12 months old. If you
>>>> want to have the latest stuff you download a nightly build of
>>>> r-devel.
>>>> For regular users and reproducible research it is recommended to
>>>> use
>>>> the stable branch. However if you are a developer (e.g. package
>>>> author) you might want to develop/test/check your work with the
>>>> latest
>>>> r-devel.
>>>>
>>>> I think that extending the R release cycle to CRAN would result
>>>> both
>>>> in more stable released versions of R, as well as more freedom for
>>>> package authors to implement rigorous change in the unstable
>>>> branch.
>>>> When writing a script that is part of a production pipeline, or
>>>> sweave
>>>> paper that should be reproducible 10 years from now, or a book on
>>>> using R, you use stable version of R, which is guaranteed to behave
>>>> the same over time. However when developing packages that should be
>>>> compatible with the upcoming release of R, you use r-devel which
>>>> has
>>>> the latest versions of other CRAN and base packages.
>>>
>>>
>>> As I remember ... The example demonstrating the need for this was an
>>> XML package that cause an extract from a website where the headers
>>> were misinterpreted as data in one version of pkg:XML and not in
>>> another. That seems fairly unconvincing. Data cleaning and
>>> validation is a basic task of data analysis. It also seems excessive
>>> to assert that it is the responsibility of CRAN to maintain a synced
>>> binary archive that will be available in ten years.
>>
>>
>> CRAN already does this, the bin/windows/contrib directory has
>> subdirectories going back to 1.7, with packages dated October 2004. I
>> don't see why it is burdensome to continue to archive these. It would
>> be nice if source versions had a similar archive.
>
> The bin/windows/contrib directories are updated every day for active R
> versions.  It's only when Uwe decides that a version is no longer worth
> active support that he stops doing updates, and it "freezes".  A
> consequence of this is that the snapshots preserved in those older
> directories are unlikely to match what someone who keeps up to date with
> R releases is using.  Their purpose is to make sure that those older
> versions aren't completely useless, but they aren't what Jeroen was
> asking for.

But it is almost completely useless from a reproducibility point of
view to get random package versions. For example if some people try
to use R-2.13.2 today to reproduce an analysis that was published
2 years ago, they'll get Matrix 1.0-4 on Windows, Matrix 1.0-3 on Mac,
and Matrix 1.1-2-2 on Unix. And none of them of course is what was used
by the authors of the paper (they used Matrix 1.0-1, which is what was
current when they ran their analysis).

A big improvement from a reproducibility point of view would be to
(a) have a clear cut for the freezes, (b) freeze the source
packages as well as the binary packages, and (c) freeze the same
versions of source and binaries. For example the freeze of
bin/windows/contrib/x.y, bin/macosx/contrib/x.y and contrib/x.y
could happen when the R-x.y series itself freezes (i.e. no more
minor versions planned for this series).

Cheers,
H.

>
> Karl Millar's suggestion seems like an ideal solution to this problem.
> Any CRAN mirror could implement it.  If someone sets this up and commits
> to maintaining it, I'd be happy to work on the necessary changes to the
> install.packages/update.packages code to allow people to use it from
> within R.
>
> Duncan Murdoch
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the R-devel mailing list