[Rd] package vignettes build in the same R process?

Sat Nov 8 18:01:46 CET 2014

On Sat, Nov 8, 2014 at 12:29 AM, Wolfgang Huber <whuber at embl.de> wrote:
> Il giorno Nov 2, 2014, alle ore 16:10 GMT+1, Duncan Murdoch <murdoch.duncan at gmail.com> ha scritto:
>
>> On 01/11/2014, 8:44 PM, Martin Morgan wrote:
>>> If I understand correctly, all vignettes in a package are built in the same R
>>> process. Global options, loaded packages, etc., in an earlier vignette persist
>>> in later vignettes. This can introduce user confusion (e.g., when a later
>>> vignette builds successfully because a package is require()'ed in an earlier
>>> vignette, but not the current one), difficult-to-identify bugs (e.g., when
>>> a setting in an earlier vignette influences calculation in a latter vignette),
>>> and misleading information about reproducibility (e.g., when the sessionInfo()
>>> of a later vignette reflects packages used in earlier vignettes).
>>>
>>> I believe the relevant code is at
>>>
>>> src/library/tools/R/Vignettes.R:505
>>>
>>>         output <- tryCatch({
>>>             ## FIXME: run this in a separate process
>>>             engine$weave(file, quiet = quiet)
>>>             setwd(startdir)
>>>             find_vignette_product(name, by = "weave", engine = engine)
>>>         }, error = function(e) {
>>>             stop(gettextf("processing vignette '%s' failed with diagnostics:\n%s",
>>>                  file, conditionMessage(e)), domain = NA, call. = FALSE)
>>>         })
>>>
>>> Is building of each vignette in separate processes a reasonable feature request?
>>
>> I'm not sure.  It's not perfect:  users may still see different output
>> than the package contains, because when they run the vignette it will
>> see their system state, but at least it gives them a way to get the
>> identical output.  On the other hand, they already have a way to do
>> that:  just build the whole package.  Overall I'd say it's probably a
>> good idea.
>
> Let the perfect be the enemy of the good?
> Martin’s proposed improvement would eliminate unnecessary complexity and a lot of potential (and actual) confusion.

I agree that this is likely a good move and will make the
reproducibility at bit more solid.   If changing, several things has
to be considered:

1. Make sure to run using the exact same R executable and architecture.
2. Make sure to use the exact same .libPaths(), which is particularly
important under R CMD check where it's composed of a minimum set of
temporary paths.
3. Preserve working directory.  ...or should also the working
directories be unique in order to bulkhead the vignettes from each
other?
4. What other settings needs to be set in order to replicate the state
of R CMD build/check?
5. How to deal with standard output and standard error?
6. How to propagate conditions such as warnings and errors?
7. Remember that buildVignette[s]() can be called manually too, not
only via R CMD build/check.
8. When you build a vignette manually via buildVignette(), should the
vignette change the state of R so it's available for
troubleshooting/debugging, inspecting variables and so on?
9. Maybe there is suite of vignettes that needs to be run sequentially
in order for them to work.  For instance, the first vignette
preprocesses the data and the second does EDA on it.  I don't think
this is currently supported, because I don't think the the order that
vignettes are processed is guaranteed (depends on locale), but maybe a
decision on supporting/not supporting this needs to be made.
10. Related to 9, when building vignettes in separate R processes, it
is tempting to also add support for parallel processing of vignettes.
If so, what decisions needs to be made already now in order to allow
for that?
11. What else?

So any change made needs to done with great care.  Adding a local=TRUE
to buildVignette[s]() could be a way to please both worlds and allow
us to move safely forward until success is proven.  Other things such
as cleaning up after the vignette engine, may be come easier when
running in a separate process, e.g. closing stray graphics devices.

BTW, an alternative to run in a separate process would be to have a
res <- sandbox({ ... }) that resets the state of R to the entry state
upon exit.  The major hurdle I see for achieving that is the fact that
packages cannot be unloaded properly.  One implementation of sandbox({
... }) would probably be to launch a separate R process.

/Henrik

>
> Wolfgang Huber
>
>>
>> I would prefer a way to detect and warn when vignette output depends on
>> the state outside the vignette, but that looks hard to do.
>>
>> Duncan Murdoch
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel