[Rd] [RFC] A case for freezing CRAN

Roger Bivand roger.bivand at nhh.no
Thu Mar 20 12:37:51 CET 2014

Gavin Simpson <ucfagls <at> gmail.com> writes:

> To my mind it is incumbent upon those wanting reproducibility to build
> the tools to enable users to reproduce works. When you write a paper
> or release a tool, you will have tested it with a specific set of
> packages. It is relatively easy to work out what those versions are
> (there are tools in R for this). What is required is an automated way
> to record that info in an agreed upon way in an approved
> file/location, and have a tool that facilitates setting up a package
> library sufficient with which to reproduce a work. That approval
> doesn't need to come from CRAN or R Core - we can store anything in
> ./inst.


Thanks for contributing useful insights. With reference to Jeroen's proposal
and the discussion so far, I can see where the problem lies, but the
proposed solutions are very invasive. What might offer a less invasive
resolution is through a robust and predictable schema for sessionInfo()
content, permitting ready parsing, so that (using Hadley's interjection) the
reproducer could reconstruct the original execution environment at least as
far as R and package versions are concerned.

In fact, I'd argue that the responsibility for securing reproducibility lies
with the originating author or organisation, so that work where
reproducibility is desired should include such a standardised record. 

There is an additional problem not addressed directly in this thread but
mentioned in some contributions, upstream of R. The further problem upstream
is actually in the external dependencies and compilers, beyond that in
hardware. So raising consciousness about the importance of being able to
query version information to enable reproducibility is important.

Next, encapsulating the information permitting its parsing would perhaps
enable the original execution environment to be reconstructed locally by
installing external dependencies, then R, then packages from source, using
the same versions of build train components if possible (and noting
mismatches if not). Maybe ressurect StatDataML in addition to RData
serialization of the version dependencies? Of course, current R and package
versions may provide reproducibility, but if they don't, one would use the
parseable record of the original development environment 

> Reproducibility is a very important part of doing "science", but not
> everyone using CRAN is doing that. Why force everyone to march to the
> reproducibility drum? I would place the onus elsewhere to make this
> work.



> Gavin
> A scientist, very much interested in reproducibility of my work and others.
> >
> > ______________________________________________
> > R-devel <at> r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel

More information about the R-devel mailing list