[Rd] [BioC] enabling reproducible research & R package management & install.package.version & BiocLite
MEC at stowers.org
Tue Mar 5 15:48:11 CET 2013
What got me started on this line of inquiry was my attempt at balancing the advantages of performing a periodic (daily or weekly) update to the 'release' version of locally installed R/Bioconductor packages on our institute-wide installation of R with the disadvantages of potentially changing the result of an analyst's workflow in mid-project.
I just got the "green light" to institute such periodic updates that I have been arguing is in our collective best interest. In return, I promised my best effort to provide a means for preserving or reverting to a working R library configuration.
Please note that the reproducibility I am most eager to provide is limited to reproducibility within the computing environment of our institute, which perhaps takes away some of the dragon's nests, though certainly not all.
There are technical issues of updating package installations on an NFS mount that might have files/libraries open on it from running R sessions. I am interested in learning of approaches for minimizing/eliminating exposure to these issue as well. The first/best approach seems to be to institute a 'black out' period when users should expect the installed library to change. Perhaps there are improvements to this????
.From: Mike Marchywka [mailto:marchywka at hotmail.com]
.Sent: Tuesday, March 05, 2013 5:24 AM
.To: amackey at virginia.edu; Cook, Malcolm
.Cc: r-devel at r-project.org; bioconductor at r-project.org; r-discussion at listserv.stowers.org
.Subject: RE: [Rd] [BioC] enabling reproducible research & R package management & install.package.version & BiocLite
.I hate to ask what go this thread started but it sounds like someone was counting on
.exact numeric reproducibility or was there a bug in a specific release? In actual
.fact, the best way to determine reproducibility is run the code in a variety of
.packages. Alternatively, you can do everything in java and not assume
.that calculations commute or associate as the code is modified but it seems
.pointless. Sensitivity determination would seem to lead to more reprodicible results
.than trying to keep a specific set of code quirks.
.I also seem to recall that FPU may have random lower order bits in some cases,
.same code/data give different results. Alsways assume FP is stochastic and plan
.on anlayzing the "noise."
.> From: amackey at virginia.edu
.> Date: Mon, 4 Mar 2013 16:28:48 -0500
.> To: MEC at stowers.org
.> CC: r-devel at r-project.org; bioconductor at r-project.org; r-discussion at listserv.stowers.org
.> Subject: Re: [Rd] [BioC] enabling reproducible research & R package management & install.package.version & BiocLite
.> On Mon, Mar 4, 2013 at 4:13 PM, Cook, Malcolm <MEC at stowers.org> wrote:
.> > * where do the dragons lurk
.> webs of interconnected dynamically loaded libraries, identical versions of
.> R compiled with different BLAS/LAPACK options, etc. Go with the VM if you
.> really, truly, want this level of exact reproducibility.
.> An alternative (and arguably more useful) strategy would be to cache
.> results of each computational step, and report when results differ upon
.> re-execution with identical inputs; if you cache sessionInfo along with
.> each result, you can identify which package(s) changed, and begin to hunt
.> down why the change occurred (possibly for the better); couple this with
.> the concept of keeping both code *and* results in version control, then you
.> can move forward with a (re)analysis without being crippled by out-of-date
.> Aaron J. Mackey, PhD
.> Assistant Professor
.> Center for Public Health Genomics
.> University of Virginia
.> amackey at virginia.edu
.> [[alternative HTML version deleted]]
.> R-devel at r-project.org mailing list
More information about the R-devel