[Rd] Force quitting a FORK cluster node on macOS and Solaris wreaks havoc

Henrik Bengtsson henr|k@bengt@@on @end|ng |rom gm@||@com
Thu Aug 12 10:22:05 CEST 2021


The following smells like a bug in R to me, because it puts the main R
session into an unstable state.  Consider the following R script:

a <- 42
message("a=", a)
cl <- parallel::makeCluster(1L, type="FORK")
try(parallel::clusterEvalQ(cl, quit(save="no")))
message("parallel:::isChild()=", parallel:::isChild())
message("a=", a)
rm(a)

The purpose of this was to emulate what happens when an parallel
workers crashes.

Now, if you source() the above on macOS, you might(*) end up with:

> a <- 42
> message("a=", a)
a=42
> cl <- parallel::makeCluster(1L, type="FORK")
> try(parallel::clusterEvalQ(cl, quit(save="no")))
Error: Error in unserialize(node$con) : error reading from connection
> message("parallel:::isChild()=", parallel:::isChild())
parallel:::isChild()=FALSE
> message("a=", a)
a=42
> rm(a)
> try(parallel::clusterEvalQ(cl, quit(save="no")))
Error: Error in unserialize(node$con) : error reading from connection
> message("parallel:::isChild()=", parallel:::isChild())
parallel:::isChild()=FALSE
> message("a=", a)
Error: Error in message("a=", a) : object 'a' not found
Execution halted

Note how 'rm(a)' is supposed to be the last line of code to be
evaluated.  However, the force quitting of the FORK cluster node
appears to result in the main code being evaluated twice (in
parallel?).

(*) This does not happen on all macOS variants. For example, it works
fine on CRAN's 'r-release-macos-x86_64' but it does give the above
behavior on 'r-release-macos-arm64'.  I can reproduce it on GitHub
Actions (https://github.com/HenrikBengtsson/teeny/runs/3309235106?check_suite_focus=true#step:10:219)
but not on R-hub's 'macos-highsierra-release' and
'macos-highsierra-release-cran'.  I can also reproduce it on R-hub's
'solaris-x86-patched' and solaris-x86-patched-ods' machines.  However,
I still haven't found a Linux machine where this happens.

If one replaces quit(save="no") with tools::pskill(Sys.getpid()) or
parallel:::mcexit(0L), this behavior does not take place (at least not
on GitHub Actions and R-hub).

I don't have access to a macOS or a Solaris machine, so I cannot
investigate further myself. For example, could it be an issue with
quit(), or does is it possible to trigger by other means? And more
importantly, should this be fixed? Also, I'd be curious what happens
if you run the above in an interactive R session.

/Henrik



More information about the R-devel mailing list