[Rd] [RFC] A case for freezing CRAN

Thu Mar 20 07:15:56 CET 2014

----- Original Message -----
> From: "David Winsemius" <dwinsemius at comcast.net>
> To: "Jeroen Ooms" <jeroen.ooms at stat.ucla.edu>
> Cc: "r-devel" <r-devel at r-project.org>
> Sent: Wednesday, March 19, 2014 11:03:32 PM
> Subject: Re: [Rd] [RFC] A case for freezing CRAN
> 
> 
> On Mar 19, 2014, at 7:45 PM, Jeroen Ooms wrote:
> 
> > On Wed, Mar 19, 2014 at 6:55 PM, Michael Weylandt
> > <michael.weylandt at gmail.com> wrote:
> >> Reading this thread again, is it a fair summary of your position
> >> to say "reproducibility by default is more important than giving
> >> users access to the newest bug fixes and features by default?"
> >> It's certainly arguable, but I'm not sure I'm convinced: I'd
> >> imagine that the ratio of new work being done vs reproductions is
> >> rather high and the current setup optimizes for that already.
> > 
> > I think that separating development from released branches can give
> > us
> > both reliability/reproducibility (stable branch) as well as new
> > features (unstable branch). The user gets to pick (and you can pick
> > both!). The same is true for r-base: when using a 'released'
> > version
> > you get 'stable' base packages that are up to 12 months old. If you
> > want to have the latest stuff you download a nightly build of
> > r-devel.
> > For regular users and reproducible research it is recommended to
> > use
> > the stable branch. However if you are a developer (e.g. package
> > author) you might want to develop/test/check your work with the
> > latest
> > r-devel.
> > 
> > I think that extending the R release cycle to CRAN would result
> > both
> > in more stable released versions of R, as well as more freedom for
> > package authors to implement rigorous change in the unstable
> > branch.
> > When writing a script that is part of a production pipeline, or
> > sweave
> > paper that should be reproducible 10 years from now, or a book on
> > using R, you use stable version of R, which is guaranteed to behave
> > the same over time. However when developing packages that should be
> > compatible with the upcoming release of R, you use r-devel which
> > has
> > the latest versions of other CRAN and base packages.
> 
> 
> As I remember ... The example demonstrating the need for this was an
> XML package that cause an extract from a website where the headers
> were misinterpreted as data in one version of pkg:XML and not in
> another. That seems fairly unconvincing. Data cleaning and
> validation is a basic task of data analysis. It also seems excessive
> to assert that it is the responsibility of CRAN to maintain a synced
> binary archive that will be available in ten years. 

CRAN already does this, the bin/windows/contrib directory has subdirectories going back to 1.7, with packages dated October 2004. I don't see why it is burdensome to continue to archive these. It would be nice if source versions had a similar archive.

Dan

> Bug fixes would
> be inhibited for years.... not unlike SAS and Excel. What next?
> Perhaps al bugs should be labeled as features?  Surely this
> CRAN-of-the-future would be offering something that no other
> statistical package currently offers, nicht wahr?
> 
> Why not leave it to the authors to specify the packages which version
> numbers were used in their publications. The authors of the packages
> would get recognition and the dependencies would be recorded.
> 
> --
> David.
> > 
> > 
> >> What I'm trying to figure out is why the standard "install the
> >> following list of package versions" isn't good enough in your
> >> eyes?
> > 
> > Almost nobody does this because it is cumbersome and impractical.
> > We
> > can do so much better than this. Note that in order to install old
> > packages you also need to investigate which versions of
> > dependencies
> > of those packages were used. On win/osx, users need to manually
> > build
> > those packages which can be a pain. All in all it makes
> > reproducible
> > research difficult and expensive and error prone. At the end of the
> > day most published results obtain with R just won't be
> > reproducible.
> > 
> > Also I believe that keeping it simple is essential for solutions to
> > be
> > practical. If every script has to be run inside an environment with
> > custom libraries, it takes away much of its power. Running a bash
> > or
> > python script in Linux is so easy and reliable that entire
> > distributions are based on it. I don't understand why we make our
> > lives so difficult in R.
> > 
> > In my estimation, a system where stable versions of R pull packages
> > from a stable branch of CRAN will naturally resolve the majority of
> > the reproducibility and reliability problems with R. And in
> > contrast
> > to what some people here are suggesting it does not introduce any
> > limitations. If you want to get the latest stuff, you either grab a
> > copy of r-devel, or just enable the testing branch and off you go.
> > Debian 'testing' works in a similar way, see
> > http://www.debian.org/devel/testing.
> > 
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> David Winsemius
> Alameda, CA, USA
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>