[Rd] [RFC] A case for freezing CRAN

Gavin Simpson ucfagls at gmail.com
Thu Mar 20 02:54:57 CET 2014


Given that R is (has) moved to a 12 month release cycle, I don't want
to either i) wait a year to get new packages (or allow users to use
new versions of my packages), or ii) have to run R-devel just to use
new packages. (or be on R-testing for that matter).

People then will start finding ways around these limitations and then
we're back to square one of having people use a set of R packages and
R versions that could potentially be all over the place.

As a package developer, it is pretty easy to say I've tested my
package works with these other packages and their versions, and set
DESCRIPTION to reflect only those versions as allowed (or a range as a
package matures and the maintainer has tested against more versions of
the dependencies). CRAN may well not like this if your package no
longer builds/checks on their system but then you have a choice to
make; stick to your reproducibility guns & forsake CRAN in favour of
something else (github, one's own repo), or relent and meet CRANs
requirements.

On 19 March 2014 16:57, Hervé Pagès <hpages at fhcrc.org> wrote:
>
>
> On 03/19/2014 02:59 PM, Joshua Ulrich wrote:
>>
>> On Wed, Mar 19, 2014 at 4:28 PM, Jeroen Ooms <jeroen.ooms at stat.ucla.edu>
>> wrote:
>>>
>>> On Wed, Mar 19, 2014 at 11:50 AM, Joshua Ulrich <josh.m.ulrich at gmail.com>
>>> wrote:
>>>>
>>>>
>>>> The suggested solution is not described in the referenced article.  It
>>>> was not suggested that it be the operating system's responsibility to
>>>> distribute snapshots, nor was it suggested to create binary
>>>> repositories for specific operating systems, nor was it suggested to
>>>> freeze only a subset of CRAN packages.
>>>
>>>
>>>
>>> IMO this is an implementation detail. If we could all agree on a
>>> particular
>>> set of cran packages to be used with a certain release of R, then it
>>> doesn't
>>> matter how the 'snapshotting' gets implemented. It could be a separate
>>> repository, or a directory on cran with symbolic links, or a page
>>> somewhere
>>> with hyperlinks to the respective source packages. Or you can put all
>>> packages in a big zip file, or include it in your OS distribution. You
>>> can
>>> even distribute your entire repo on cdroms (debian style!) or do all of
>>> the
>>> above.
>>>
>>> The hard problem is not implementation. The hard part is that for
>>> reproducibility to work, we need community wide conventions on which
>>> versions of cran packages are used by a particular release of R. Local
>>> downstream solutions are impractical, because this results in
>>> scripts/packages that only work within your niche using this particular
>>> snapshot. I expect that requiring every script be executed in the context
>>> of
>>> dependencies from some particular third party repository will make
>>> reproducibility even less common. Therefore I am trying to make a case
>>> for a
>>> solution that would naturally improve reliability/reproducibility of R
>>> code
>>> without any effort by the end-user.
>>>
>> So implementation isn't a problem.  The problem is that you need a way
>> to force people not to be able to use different package versions than
>> what existed at the time of each R release.  I said this in my
>> previous email, but you removed and did not address it: "However, you
>> would need to find a way to actively _prevent_ people from installing
>> newer versions of packages with the stable R releases."  Frankly, I
>> would stop using CRAN if this policy were adopted.
>>
>> I suggest you go build this yourself.  You have all the code available
>> on CRAN, and the dates at which each package was published.  If others
>> who care about reproducible research find what you've built useful,
>> you will create the very community you want.  And you won't have to
>> force one single person to change their workflow.
>
>
> Yeah we've already heard this "do it yourself" kind of answer. Not a
> very productive one honestly.
>
> Well actually that's what we've done for the Bioconductor repositories:
> we freeze the BioC packages for each version of Bioconductor. But since
> this freezing doesn't happen at the CRAN level, and many BioC packages
> depend on CRAN packages, the freezing is only at the surface. Would be
> much better if the freezing was all the way down to the bottom of the
> sea. (Note that it is already if you install binary packages only.)
>
> Yes it's technically possible to work around this by also hosting
> frozen versions of CRAN, one per version of Bioconductor, and have
> biocLite() (the tool BioC users use for installing packages) point to
> these frozen versions of CRAN in order to get the correct dependencies
> for any given version of BioC. However we don't do that because that
> would mean extra costs for us in terms of storage space and bandwidth.
> And also because we believe that it would be more effective and would
> ultimately benefit the entire R community (and not just the BioC
> community) if this problem was addressed upstream.
>
>
> H.
>
>>
>> Best,
>> --
>> Joshua Ulrich  |  about.me/joshuaulrich
>> FOSS Trading  |  www.fosstrading.com
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Gavin Simpson, PhD



More information about the R-devel mailing list