[Rd] Docker versus Vagrant for reproducability - was: The case for freezing CRAN

Philippe GROSJEAN Philippe.GROSJEAN at umons.ac.be
Fri Mar 21 14:03:26 CET 2014

 ) ) ) ) )
( ( ( ( (    Prof. Philippe Grosjean
 ) ) ) ) )
( ( ( ( (    Numerical Ecology of Aquatic Systems
 ) ) ) ) )   Mons University, Belgium
( ( ( ( (

On 21 Mar 2014, at 10:59, Rainer M Krug <Rainer at krugs.de> wrote:

> Dirk Eddelbuettel <edd at debian.org> writes:
>> o Roger correctly notes that R scripts and packages are just one issue.
>>   Compilers, libraries and the OS matter.  To me, the natural approach these
>>   days would be to think of something based on Docker or Vagrant or (if you
>>   must, VirtualBox).  The newer alternatives make snapshotting very cheap
>>   (eg by using Linux LXC).  That approach reproduces a full environemnt as
>>   best as we can while still ignoring the hardware layer (and some readers
>>   may recall the infamous Pentium bug of two decades ago).
> These two tools look very interesting - but I have, even after reading a
> few discussions of their differences, no idea which one is better suited
> to be used for what has been discussed here: Making it possible to run
> the analysis later to reproduce results using the same versions used in
> the initial analysis.
> Am I right in saying:
> - Vagrant uses VMs to emulate the hardware
> - Docker does not

> wherefore
> - Vagrant is slower and requires more space
> - Docker is faster and requires less space
It depends. For instance, if you run R in VirtualBox under Windows, it may run faster depending on the code you run and, say, the Lapack library used. On Linux, you typically got R code run in the VM 2-3% slower than natively, but In a Windows host, most of my R code runs faster in the VM… But yes, you need more RAM.

With Vagrant, you do not need to keep you VM once you don't use it any more. Then, disk space is shrunk down to a few kB, corresponding to the Vagrant configuration file. I guess the same is true for Docker?

A big advantage of Vagrant + VirtualBox is that you got a very similar virtual hardware, no matter if your host system is Linux, Windows or Mac OS X. I see this as a good point for better reproducibility.

> Therefore, could one say that Vagrant is more "robust" in the long run?
May be,… but it depends almost entirely how VirtualBox will support old VMs in the future!


> How do they compare in relation to different platforms? Vagrant seems to
> be platform agnostic, I can develop and run on Linux, Mac and Windows -
> how does it work with Docker? 
> I just followed [1] and setup Docker on OSX - loos promising - it also
> uses an underlying VM. SO both should be equal in regards to
> reproducability in the long run?
> Please note: I see these questions in the light of this discussion of
> reproducability and not in regards to deployment of applications what
> the discussions on the web are.
> Any comments, thoughts, remarks?
> Rainer
> Footnotes: 
> [1]  http://docs.docker.io/en/latest/installation/mac/
> -- 
> Rainer M. Krug
> email: Rainer<at>krugs<dot>de
> PGP: 0x0F52F982
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

More information about the R-devel mailing list