[Rd] [RFC] A case for freezing CRAN

Philippe Grosjean phgrosjean at sciviews.org
Fri Mar 21 10:06:27 CET 2014

This is becoming an extremely long thread, and it is going in too many directions. However, I would like to mention here our ongoing five years projects ECOS project for the study of Open Source Ecosystems, among which, CRAN. You can find info here: http://informatique.umons.ac.be/genlog/projects/ecos/. We are in the second year now.

We are currently working on CRAN maintainability questions. See:

- Claes Maelick, Mens Tom, Grosjean Philippe, "On the maintainability of CRAN packages" in IEEE CSMR-WCRE 2014 Software Evolution Week, Antwerpen, Belgique, 2014 (2014)

- Mens Tom, Claes Maelick, Grosjean Philippe, Serebrenik Alexander, "Studying Evolving Software Ecosystems based on Ecological Models" in Mens Tom, Serebrenik Alexander, Cleve Anthony, "Evolving Software Systems" , Springer, Mens Tom, Serebrenik Alexander, Cleve Anthony, 978-3-642-45397-7 (2014)

Currently, we are building an Open Source system based on Virtualbox and Vagrant to recreate a virtual machine under Linux (Debian and Ubuntu considered for the moment) that would be as close as possible as a "simulated CRAN environment as it was at any given date". Our plans are to replay CRAN back in time and to instrumentize that platform to measure what we need for our ecological studies of CRAN.

The connection with this thread is the possibility to reuse this system for proposing something useful for reproducible research, that is, a reproducible platform, in the definition of reproducibility vs replicability Jeroen Ooms mentions. It would then be enough to record the date some R code was run on that platform (and perhaps whether it is 32 or 64 bit system) to be able to rebuild a similar software environment with all corresponding CRAN packages of the right version easily installable. In case something specific is required in addition to software proposed by default, Vagrant allows provisioning the Virtual machine in an easy way too… but then, the provisioning script must be provided too (not much a problem). Info required to rebuild the platform is shrunk down to a few kb Ascii text file. This is something easy to put together with your R code in, say, additional material of a publication. 

Please, keep in mind that many platform-specific features in R (graphic devices, string encoding, and many more) may be a problem too for reproducing published results. Hence, the idea to use a virtual box using only one OS, Linux, no matter if you work on Windows, or Mac OS X, or… Solaris (anyone there?).


On 20 Mar 2014, at 21:53, Jeroen Ooms <jeroen.ooms at stat.ucla.edu> wrote:

> On Thu, Mar 20, 2014 at 1:28 PM, Ted Byers <r.ted.byers at gmail.com> wrote:
>> Herve Pages mentions the risk of irreproducibility across three minor
>> revisions of version 1.0 of Matrix.  My gut reaction would be that if the
>> results are not reproducible across such minor revisions of one library,
>> they are probably just so much BS.
> Perhaps this is just terminology, but what you refer to I would generally
> call 'replication'. Of course being able to replicate results with other
> data or other software is important to validate claims. But being able to
> reproduce how the original results were obtained is an important part of
> this process.
> If someone is publishing results that I think are questionable and I cannot
> replicate them, I want to know exactly how those outcomes were obtained in
> the first place, so that I can 'debug' the problem. It's quite important to
> be able to trace back if incorrect results were a result of a bug,
> incompetence or fraud.
> Let's take the example of the Reinhart and Rogoff case. The results
> obviously were not replicable, but without more information it was just the
> word of a grad students vs two Harvard professors. Only after reproducing
> the original analysis it was possible to point out the errors and proof
> that the original were incorrect.
> 	[[alternative HTML version deleted]]
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

More information about the R-devel mailing list