[Rd] SUGGESTION: Settings to disable forked processing in R, e.g. parallel::mclapply()

Henrik Bengtsson henr|k@bengt@@on @end|ng |rom gm@||@com
Fri Jan 10 16:58:13 CET 2020


The RStudio GUI was just one example.  AFAIK, and please correct me if
I'm wrong, another example is where multi-threaded code is used in
forked processing and that's sometimes unstable.  Yes another, which
might be multi-thread related or not, is
https://stat.ethz.ch/pipermail/r-devel/2018-September/076845.html:

res <- parallel::mclapply(urls, function(url) {
  download.file(url, basename(url))
})

That was reported to fail on macOS with the default method="libcurl"
but not for method="curl" or method="wget".

Further documentation is needed and would help but I don't believe
it's sufficient to solve everyday problems.  The argument for
introducing an option/env var to disable forking is to give the end
user a quick workaround for newly introduced bugs.  Neither the
develop nor the end user have full control of the R package stack,
which is always in flux.  For instance, above mclapply() code might
have been in a package on CRAN and then all of a sudden
method="libcurl" became the new default in base R.  The above
mclapply() code is now buggy on macOS, and not necessarily caught by
CRAN checks.  The package developer might not notice this because they
are on Linux or Windows.  It can take a very long time before this
problem is even noticed and even further before it is tracked down and
fixed.   Similarly, as more and more code turn to native code and it
becomes easier and easier to implement multi-threading, more and more
of these bugs across package dependencies risk sneaking in the
backdoor wherever forked processing is in place.

For the end user, but also higher-up upstream package developers, the
quickest workaround would be disable forking.  If you're conservative,
you could even disable it all of your R processing.  Being able to
quickly disable forking will also provide a mechanism for quickly
testing the hypothesis that forking is the underlying problem, i.e.
"Please retry with options(fork.allowed = FALSE)" will become handy
for troubleshooting.

/Henrik

On Fri, Jan 10, 2020 at 5:31 AM Simon Urbanek
<simon.urbanek using r-project.org> wrote:
>
> If I understand the thread correctly this is an RStudio issue and I would suggest that the developers consider using pthread_atfork() so RStudio can handle forking as they deem fit (bail out with an error or make RStudio work).  Note that in principle the functionality requested here can be easily implemented in a package so R doesn’t need to be modified.
>
> Cheers,
> Simon
>
> Sent from my iPhone
>
> >> On Jan 10, 2020, at 04:34, Tomas Kalibera <tomas.kalibera using gmail.com> wrote:
> >>
> >> On 1/10/20 7:33 AM, Henrik Bengtsson wrote:
> >> I'd like to pick up this thread started on 2019-04-11
> >> (https://hypatia.math.ethz.ch/pipermail/r-devel/2019-April/077632.html).
> >> Modulo all the other suggestions in this thread, would my proposal of
> >> being able to disable forked processing via an option or an
> >> environment variable make sense?
> >
> > I don't think R should be doing that. There are caveats with using fork, and they are mentioned in the documentation of the parallel package, so people can easily avoid functions that use it, and this all has been discussed here recently.
> >
> > If it is the case, we can expand the documentation in parallel package, add a warning against the use of forking with RStudio, but for that I it would be good to know at least why it is not working. From the github issue I have the impression that it is not really known why, whether it could be fixed, and if so, where. The same github issue reflects also that some people want to use forking for performance reasons, and even with RStudio, at least on Linux. Perhaps it could be fixed? Perhaps it is just some race condition somewhere?
> >
> > Tomas
> >
> >> I've prototyped a working patch that
> >> works like:
> >>> options(fork.allowed = FALSE)
> >>> unlist(parallel::mclapply(1:2, FUN = function(x) Sys.getpid()))
> >> [1] 14058 14058
> >>> parallel::mcmapply(1:2, FUN = function(x) Sys.getpid())
> >> [1] 14058 14058
> >>> parallel::pvec(1:2, FUN = function(x) Sys.getpid() + x/10)
> >> [1] 14058.1 14058.2
> >>> f <- parallel::mcparallel(Sys.getpid())
> >> Error in allowFork(assert = TRUE) :
> >>  Forked processing is not allowed per option ‘fork.allowed’ or
> >> environment variable ‘R_FORK_ALLOWED’
> >>> cl <- parallel::makeForkCluster(1L)
> >> Error in allowFork(assert = TRUE) :
> >>  Forked processing is not allowed per option ‘fork.allowed’ or
> >> environment variable ‘R_FORK_ALLOWED’
> >> The patch is:
> >> Index: src/library/parallel/R/unix/forkCluster.R
> >> ===================================================================
> >> --- src/library/parallel/R/unix/forkCluster.R (revision 77648)
> >> +++ src/library/parallel/R/unix/forkCluster.R (working copy)
> >> @@ -30,6 +30,7 @@
> >> newForkNode <- function(..., options = defaultClusterOptions, rank)
> >> {
> >> +    allowFork(assert = TRUE)
> >>     options <- addClusterOptions(options, list(...))
> >>     outfile <- getClusterOption("outfile", options)
> >>     port <- getClusterOption("port", options)
> >> Index: src/library/parallel/R/unix/mclapply.R
> >> ===================================================================
> >> --- src/library/parallel/R/unix/mclapply.R (revision 77648)
> >> +++ src/library/parallel/R/unix/mclapply.R (working copy)
> >> @@ -28,7 +28,7 @@
> >>         stop("'mc.cores' must be >= 1")
> >>     .check_ncores(cores)
> >> -    if (isChild() && !isTRUE(mc.allow.recursive))
> >> +    if (!allowFork() || (isChild() && !isTRUE(mc.allow.recursive)))
> >>         return(lapply(X = X, FUN = FUN, ...))
> >>     ## Follow lapply
> >> Index: src/library/parallel/R/unix/mcparallel.R
> >> ===================================================================
> >> --- src/library/parallel/R/unix/mcparallel.R (revision 77648)
> >> +++ src/library/parallel/R/unix/mcparallel.R (working copy)
> >> @@ -20,6 +20,7 @@
> >> mcparallel <- function(expr, name, mc.set.seed = TRUE, silent =
> >> FALSE, mc.affinity = NULL, mc.interactive = FALSE, detached = FALSE)
> >> {
> >> +    allowFork(assert = TRUE)
> >>     f <- mcfork(detached)
> >>     env <- parent.frame()
> >>     if (isTRUE(mc.set.seed)) mc.advance.stream()
> >> Index: src/library/parallel/R/unix/pvec.R
> >> ===================================================================
> >> --- src/library/parallel/R/unix/pvec.R (revision 77648)
> >> +++ src/library/parallel/R/unix/pvec.R (working copy)
> >> @@ -25,7 +25,7 @@
> >>     cores <- as.integer(mc.cores)
> >>     if(cores < 1L) stop("'mc.cores' must be >= 1")
> >> -    if(cores == 1L) return(FUN(v, ...))
> >> +    if(cores == 1L || !allowFork()) return(FUN(v, ...))
> >>     .check_ncores(cores)
> >>     if(mc.set.seed) mc.reset.stream()
> >> with a new file src/library/parallel/R/unix/allowFork.R:
> >> allowFork <- function(assert = FALSE) {
> >>    value <- Sys.getenv("R_FORK_ALLOWED")
> >>    if (nzchar(value)) {
> >>        value <- switch(value,
> >>           "1"=, "TRUE"=, "true"=, "True"=, "yes"=, "Yes"= TRUE,
> >>           "0"=, "FALSE"=,"false"=,"False"=, "no"=, "No" = FALSE,
> >>            stop(gettextf("invalid environment variable value: %s==%s",
> >>           "R_FORK_ALLOWED", value)))
> >> value <- as.logical(value)
> >>    } else {
> >>        value <- TRUE
> >>    }
> >>    value <- getOption("fork.allowed", value)
> >>    if (is.na(value)) {
> >>        stop(gettextf("invalid option value: %s==%s", "fork.allowed", value))
> >>    }
> >>    if (assert && !value) {
> >>      stop(gettextf("Forked processing is not allowed per option %s or
> >> environment variable %s", sQuote("fork.allowed"),
> >> sQuote("R_FORK_ALLOWED")))
> >>    }
> >>    value
> >> }
> >> /Henrik
> >>> On Mon, Apr 15, 2019 at 3:12 AM Tomas Kalibera <tomas.kalibera using gmail.com> wrote:
> >>> On 4/15/19 11:02 AM, Iñaki Ucar wrote:
> >>>> On Mon, 15 Apr 2019 at 08:44, Tomas Kalibera <tomas.kalibera using gmail.com> wrote:
> >>>>> On 4/13/19 12:05 PM, Iñaki Ucar wrote:
> >>>>>> On Sat, 13 Apr 2019 at 03:51, Kevin Ushey <kevinushey using gmail.com> wrote:
> >>>>>>> I think it's worth saying that mclapply() works as documented
> >>>>>> Mostly, yes. But it says nothing about fork's copy-on-write and memory
> >>>>>> overcommitment, and that this means that it may work nicely or fail
> >>>>>> spectacularly depending on whether, e.g., you operate on a long
> >>>>>> vector.
> >>>>> R cannot possibly replicate documentation of the underlying operating
> >>>>> systems. It clearly says that fork() is used and readers who may not
> >>>>> know what fork() is need to learn it from external sources.
> >>>>> Copy-on-write is an elementary property of fork().
> >>>> Just to be precise, copy-on-write is an optimization widely deployed
> >>>> in most modern *nixes, particularly for the architectures in which R
> >>>> usually runs. But it is not an elementary property; it is not even
> >>>> possible without an MMU.
> >>> Yes, old Unix systems without virtual memory had fork eagerly copying.
> >>> Not relevant today, and certainly not for systems that run R, but indeed
> >>> people interested in OS internals can look elsewhere for more precise
> >>> information.
> >>> Tomas
> >
> > ______________________________________________
> > R-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list