[Rd] [RFC] A case for freezing CRAN

Jari Oksanen jari.oksanen at oulu.fi
Fri Mar 21 10:17:24 CET 2014


On 21/03/2014, at 10:40 AM, Rainer M Krug wrote:

> 
> 
> This is a long and (mainly) interesting discussion, which is fanning out
> in many different directions, and I think many are not that relevant to
> the OP's suggestion. 
> 
> I see the advantages of having such a dynamic CRAN, but also of having a
> more stable CRAN. I prefer CRAN as it is now, but ion many cases a more
> stable CRAN might b an advantage. So having releases of CRAN might make
> sense. But then there is the archiving issue of CRAN.
> 
> The suggestion was made to move the responsibility away from CRAN and
> the R infrastructure to the user / researcher to guarantee that the
> results can be re-run years later. It would be nice to have this build
> in CRAN, but let's stick at the scenario that the user should care for
> reproducability.

There are two different problems that alternate in the discussion: reproducibility and breakage of CRAN dependencies. Frozen CRAN could make *approximate* reproducibility easier to achieve, but real reproducibility needs stricter solutions. Actual sessionInfo() is minimal information, but re-building a spitting image of old environment may still be demanding (but in many cases this does not matter). 

Another problem is that CRAN is so volatile that new versions of packages break other packages or old scripts. Here the main problem is how package developers work. Freezing CRAN would not change that: if package maintainers release breaking code, that would be frozen. I think that most packages do not make distinction between development and release branches, and CRAN policy won't change that. 

I can sympathize with package maintainers having 150 reverse dependencies. My main package only has ~50, and it is sure that I won't test them all with new release. I sometimes tried, but I could not even get all those built because they had other dependencies on packages that failed. Even those that I could test failed to detect problems (in one case all examples were \dontrun and passed nicely tests). I only wish that if people *really* depend on my package, they test it against R-Forge version and alert me before CRAN releases, but that is not very likely (I guess many dependencies are not *really* necessary, but only concern marginal features of the package, but CRAN forces to declare those). 

Still a few words about reproducibility of scripts: this can be hardly achieved with good coverage, because many scripts are so very ad hoc. When I edit and review manuscripts for journals, I very often get Sweave or knitr scripts that "just work", where "just" means "just so and so". Often they do not work at all, because they had some undeclared private functionalities or stray files in the author workspace that did not travel with the Sweave document. I think these -- published scientific papers -- are the main field where the code really should be reproducible, but they often are the hardest to reproduce. Nothing CRAN people do can help with sloppy code scientists write for publications. You know, they are scientists -- not engineers. 

Cheers, Jari Oksanen
> 
> Leaving the issue of compilation out, a package which is creating a
> custom installation of the R version which includes the source of the R
> version used and the sources of the packages in a on Linux compilable
> format, given that the relevant dependencies are installed, would be a
> huge step forward. 
> 
> I know - compilation on Windows (and sometimes Mac) is a serious
> problem), but to archive *all* binaries and to re-compile all older
> versions of R and all packages would be an impossible task.
> 
> Apart from that - doing your analysis in a Virtual Machine and then
> simply archiving this Virtual Machine, would also be an option, but only
> for the more tech savy users.
> 
> In a nutshell: I think a package would be able to provide the solution
> for a local archiving to make it possible to re-run the simulation with
> the same tools at a later stage - although guarantees would not be
> possible.
> 
> Cheers,
> 
> Rainer
> -- 
> Rainer M. Krug
> email: Rainer<at>krugs<dot>de
> PGP: 0x0F52F982
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list