[Rd] [RFC] A case for freezing CRAN

Thu Mar 20 03:17:18 CET 2014

Michael,

I think the issue is that Jeroen wants to take that responsibility out
of the hands of the person trying to reproduce a work. If it used R
3.0.x and packages A, B and C then it would be trivial to to install
that version of R and then pull down the stable versions of A B and C
for that version of R. At the moment, one might note the packages used
and even their versions, but what about the versions of the packages
that the used packages rely upon & so on? What if developers don't
state know working versions of dependencies?

The problem is how the heck do you know which versions of packages are
needed if developers don't record these dependencies in sufficient
detail? The suggested solution is to freeze CRAN at intervals
alongside R releases. Then you'd know what the stable versions were.

Or we could just get package developers to be more thorough in
documenting dependencies. Or R CMD check could refuse to pass if a
package is listed as a dependency but with no version qualifiers. Or
have R CMD build add an upper bound (from the current, at build-time
version of dependencies on CRAN) if the package developer didn't
include and upper bound. Or... The first is unliekly to happen
consistently, and no-one wants *more* checks and hoops to jump through
:-)

To my mind it is incumbent upon those wanting reproducibility to build
the tools to enable users to reproduce works. When you write a paper
or release a tool, you will have tested it with a specific set of
packages. It is relatively easy to work out what those versions are
(there are tools in R for this). What is required is an automated way
to record that info in an agreed upon way in an approved
file/location, and have a tool that facilitates setting up a package
library sufficient with which to reproduce a work. That approval
doesn't need to come from CRAN or R Core - we can store anything in
./inst.

Reproducibility is a very important part of doing "science", but not
everyone using CRAN is doing that. Why force everyone to march to the
reproducibility drum? I would place the onus elsewhere to make this
work.

Gavin
A scientist, very much interested in reproducibility of my work and others.

On 19 March 2014 19:55, Michael Weylandt <michael.weylandt at gmail.com> wrote:
>
>
> On Mar 19, 2014, at 18:42, Joshua Ulrich <josh.m.ulrich at gmail.com> wrote:
>
>> On Wed, Mar 19, 2014 at 5:16 PM, Jeroen Ooms <jeroen.ooms at stat.ucla.edu> wrote:
>>> On Wed, Mar 19, 2014 at 2:59 PM, Joshua Ulrich <josh.m.ulrich at gmail.com> wrote:
>>>>
>>>> So implementation isn't a problem.  The problem is that you need a way
>>>> to force people not to be able to use different package versions than
>>>> what existed at the time of each R release.  I said this in my
>>>> previous email, but you removed and did not address it: "However, you
>>>> would need to find a way to actively _prevent_ people from installing
>>>> newer versions of packages with the stable R releases."  Frankly, I
>>>> would stop using CRAN if this policy were adopted.
>>>
>>> I am not proposing to "force" anything to anyone, those are your
>>> words. Please read the proposal more carefully before derailing the
>>> discussion. Below *verbatim* a section from the paper:
>> <snip>
>>
>> Yes "force" is too strong a word.  You want a barrier (however small)
>> to prevent people from installing newer (or older) versions of
>> packages than those that correspond to a given R release.
>
>
> Jeroen,
>
> Reading this thread again, is it a fair summary of your position to say "reproducibility by default is more important than giving users access to the newest bug fixes and features by default?" It's certainly arguable, but I'm not sure I'm convinced: I'd imagine that the ratio of new work being done vs reproductions is rather high and the current setup optimizes for that already.
>
> What I'm trying to figure out is why the standard "install the following list of package versions" isn't good enough in your eyes? Is it the lack of CRAN provided binaries or the fact that the user has to proactively set up their environment to replicate that of published results?
>
> In your XML example, it seems the problem was that the reproducer didn't check that the same package versions as the reproducee and instead assumed that 'latest' would be the same. Annoying yes, but easy to solve.
>
> Michael
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Gavin Simpson, PhD