Interrupts (was Re: [Rd] X11 protocol errors ...)

Luke Tierney luke@nokomis.stat.umn.edu
Thu, 23 Aug 2001 09:39:37 -0500


On Thu, Aug 23, 2001 at 08:16:09AM -0400, Duncan Murdoch wrote:
> On Wed, 22 Aug 2001 19:32:51 -0500, you wrote:
> >I'm surprised we don't get more of these sorts of things on UNIX.  Our
> >current UNIX interrupt handling approach takes an immediate LONGJMP
> >out of the signal handler no matter where the signal occurs (except
> >for two places where signals are suspended).  Any place where an
> >invariant is temporarily broken, any place where an assignment is not
> >yet complete, is a potential trouble spot.
> 
> Delphi protects against these kind of errors with "try ... except ..."
> and "try ... finally ..." blocks to deal with exceptions.  The first
> one executes special code if an exception occurs, the second
> guarantees execution of cleanup code.
> 
> These are implemented as a linked list of records on the stack which
> the exception handler knows how to interpret.  When a particular kind
> of exception occurs, all "finally" blocks are executed in the
> appropriate order (and the stack base pointer is moved to simulate
> exits from all active routines) until an "except" block handling the
> particular kind of exception is reached.
> 
> C doesn't have these statements built in, but presumably someone has
> written macros to do the same sort of thing.  Adding them would be a
> lot of work, but would be worthwhile.
> 

I'm talking about something related but different: controlling the
point at which an asynchronous signal is brought into the system (and
turned into an exception if we have a proper exception system.)  R
currently has on.exit, and Robert Gentleman and I proposed a more
structured exception mechanism for possible addition to R in the neas
future.

[I sent a posting about the proposed mechanism a while back.  So far
we have received little feedback, so here is another request: Please
have a look at http://www.stat.umn.edu/~luke/R/exceptions/simpcond.html
and let us know if you have any comments/suggestions]

But that is not the issue here.  The issue is whether we allow a
SIGINT signal in UNIX (and whatever its analog is on other systems) to
interrupt the current calculation immediately, no matter where it
might be, or whether we impose more structure.  Windows/Mac pretty
much force more structure at the C level, since their analogs have to
arrive through mechnisms that require explicit polling.  So on Windows
you know that an expression like

	x = malloc(n)

will not get interrupted between the malloc call and the assignment to
x (unless some very low level tricks are involved).  On UNIX, the
signal can arrive in between those two operations.

The safe thing to do on UNIX is to have the signal handler just set a
flag which is then checked at appropriate points.  This is the
approach that John Eaton mentioned, and is used by most Scheme systems
I've looked at.  I suspect Python and Perl do this as well, but I'll
have to check.  This is also the way Java handles thread interrupts.
It would make the UNIX behavior identical to the WIndows behavior.

The drawback for systems like R and Octave is that we rely on being
able to use chunks of C/Fortran that can potentially run for a long
time (forever if they happen to get into infinite loops occasionally)
and where it is either impractical or impossible to insert flag
checking code.  For those situations it is nice to be able to use a
signal handler to force a jump out of that code.  We live without this
ability on Windows/Mac, and don't do too badly there, but it would be
nice not to compltely loose this facility on UNIX. Most numerical code
tends to not behave too badly when exited by a longjmp, but there are
no guarantees.  For example, if a piece of C code does something like this:

	static inited = FALSE;
	if (! inited) {
	    inited = TRUE;
	    ... initialize a table needed for computations ...
        }
        ... use the table ..

and a Control-C arrives in the first call after inited=TRUE is executad
but before the table is fully initialized, then future calls to this
function will happily return nonsense.

One option would be to tag routines at library regestration time as
safe for LONGJMP's or not.  That way we can disable LONGJUMP
interrupts everywhere except in explicitly marked .C or .Fortran calls
(and blocking IO operations). This will insure that no internal R
state gets messed up by asynchronous signals that arrive at on
inopportune time.

But this only addresses the C level.  On Windows/Mac, the place where
a user break is turned into an R exception is (mainly) in the internal
eval, where every 1000 calculatins (or some such number) the flag is
checked and a jump is done if the flag is set.  UNIX would work the
same way.  Since the internals know exactly where this jump can occur,
unlike jumps out of a signal handler, they can make sure all internal
state is consistent before checking the flag.

>From the R level things look different: the 1000'th eval step can
happen anywhere, so a piece of R code that does

	file <- file(file, "w")
        on.exit(close(file))
        ... do something with file ...

has a race condition: an interrupt that arrives between the creation
of the file and the registration of the on.exit handler will leave
the file open.  Something along the lines of

	without.interrupts({
            file <- file(file, "w")
            on.exit(close(file))
            with.interrupts(... do something with file ...)
        })

would be safe but is too awkward in this form. [Using a structured
exception handling mechanism, some sort of try/finally construct,
would make this code cleaner but would not resolve the race
condition.]

There are no easy solutions I think, but we need to look at a range of
options and see what works best.

[Threads add the additional problem that an interrupted thread might
be holding a lock, and failure to release the lock could cause
deadlock.  Using a structured exception handling mechanism to manage
lock release helps, but race conditions are still potentially an issue
with asynchronous interrupts.]

luke

-- 
Luke Tierney
University of Minnesota                      Phone:           612-625-7843
School of Statistics                         Fax:             612-624-8868
313 Ford Hall, 224 Church St. S.E.           email:      luke@stat.umn.edu
Minneapolis, MN 55455 USA                    WWW:  http://www.stat.umn.edu
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._