[Rd] User interrupts parallel excution. Why it works or why not?

Jiefei Wang @zwj|08 @end|ng |rom gm@||@com
Tue Jul 20 12:26:16 CEST 2021


Thanks for your explanation. This makes a lot of sense! SIGINT
handling is a blind spot to me, this introduction looks perfect!

Best,
Jiefei

On Tue, Jul 20, 2021 at 4:31 PM Tomas Kalibera <tomas.kalibera using gmail.com> wrote:
>
> Hi Jiefei,
>
> when you run the cluster "automatically" in your terminal and pres
> Ctrl-C in Unix, both the master and the worker processes get the SIGINT
> signal, because they belong to the same foreground process group. So you
> are directly interrupting also the worker process.
>
> When you run the cluster "manually", that is the master in one terminal
> window and the worker in another, they are in different process groups
> and if you pres Ctrl-C in the terminal running the master, only the
> master will receive SIGINT signal, not the worker.
>
> If you wanted to read the sources more, look for SIGINT handling in R,
> the onintrEx() function, etc. A good source on signal handling is e.g.
> http://www.linusakesson.net/programming/tty/
>
> Best
> Tomas
>
> On 7/20/21 9:55 AM, Jiefei Wang wrote:
> > Hi all,
> >
> > I just notice this interesting problem a few days before, but I cannot
> > find an answer for it. Say if you have a long-running job in a cluster
> > made by the parallel package and you decide to stop the execution by
> > pressing ctr + c in the terminal or the stop button in Rstudio for
> > some reason. After the interrupt, is the cluster still valid or not?
> > Below is a simple example code
> >
> > library(parallel)
> > cl <- makeCluster(1)
> > ## run and interrupt it
> > parLapply(cl, 1, function(i){Sys.sleep(10);Sys.getpid()})
> > ## run another apply function to check the cluster status
> > parLapply(cl, 1, function(i)i)
> >
> >  From my test result, the answer is yes. The worker is interrupted
> > immediately and the cluster is ready for the next command, but when I
> > create the worker manually, things seem different.
> >
> > library(parallel)
> > cl <- makeCluster(1, manual = TRUE)
> > ## run and interrupt it
> > parLapply(cl, 1, function(i){Sys.sleep(10);Sys.getpid()})
> > ## run another apply function to check the cluster status
> > parLapply(cl, 1, function(i)i)
> >
> > It seems like the worker does not know the manager has been
> > interrupted and still runs the current task. I have to wait for 10
> > seconds before I can get the result from the last line of the code and
> > the return value is the PID from the first apply function.
> >
> > Both cases are reasonable, but it is surprising to see them at the
> > same time. I start to wonder how the user interrupt is handled, so I
> > looked at the code in the parallel package. However, it looks like
> > there is no related code, there is no try-catch statement in the
> > manager's code to handle the user interrupt, but the worker just
> > magically knows it should stop the current execution.
> >
> > I can see this behavior in both Win and Ubuntu. It is kind of beyond
> > my knowledge, so I wonder if anyone can help me with it. Does the
> > cluster support the user interrupt? Why the above code works or not
> > works? Many thanks!
> >
> > Best,
> > Jiefei
> >
> > ______________________________________________
> > R-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list