[Rd] save.image Non-responsive to Interrupt

Henrik Bengtsson henr|k@bengt@@on @end|ng |rom gm@||@com
Wed May 3 01:25:41 CEST 2023


Along the lines of calling R_CheckUserInterrupt() only onces in a while:

> OTOH, in the past we have had to *disable*  R_CheckUserInterrupt()
> in parts of R's code because it was too expensive,
> {see current src/main/{seq.c,unique.c}  for a series of commented-out
> R_CheckUserInterrupt() for such speed-loss reasons}

First, here are links to these two files viewable online:

 * https://github.com/wch/r-source/blob/trunk/src/main/seq.c

 * https://github.com/wch/r-source/blob/trunk/src/main/unique.c

When not commented out, R_CheckUserInterrupt() would have been called
every 1,000,000 times per:

  /* interval at which to check interrupts */
  #define NINTERRUPT 1000000

and

  if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt()

in each iteration. That '(i+1) % NINTERRUPT == 0' expression can be
quite expensive too.  However, if we change the code to use NINTERRUPT
= 2^k where k = {1, 2, ...}, say

  #define NINTERRUPT 1048576

the compiler would optimize the condition to use "the modulo of powers
of 2 can alternatively be expressed as a bitwise AND operation"
(Thomas Lumley, 2015-06-15).  The speedup is quite impressive, cf.
<https://www.jottr.org/2015/06/05/checkuserinterrupt/>.
Alternatively, one can increment a counter and reset to zero after
calling R_CheckUserInterrupt(); I think that's equally performant.

Regarding making serialize() / unserialize() interruptible: I think
can be a good idea since we work with larger objects these days.
However, if we implement this, we probably have to consider what
happens when an interrupt happens. For example, transfers between a
client and a server are no longer atomic at this level, which means we
might end up in a corrupt state. This may, for instance, happen to
database transactions, and PSOCK parallel worker communication.  A
quick fix would be to use base::suspendInterrupts(), but better would
of course be to handle interrupts gracefully.

My $.02 + $0.02

/Henrik

On Tue, May 2, 2023 at 3:56 PM Jeroen Ooms <jeroenooms using gmail.com> wrote:
>
> On Tue, May 2, 2023 at 3:29 PM Martin Maechler
> <maechler using stat.math.ethz.ch> wrote:
> >
> > >>>>> Ivan Krylov
> > >>>>>     on Tue, 2 May 2023 14:59:36 +0300 writes:
> >
> >     > В Sat, 29 Apr 2023 00:00:02 +0000
> >     > Dario Strbenac via R-devel <r-devel using r-project.org> пишет:
> >
> >     >> Could save.image() be redesigned so that it promptly responds to
> >     >> Ctrl+C? It prevents the command line from being used for a number of
> >     >> hours if the contents of the workspace are large.
> >
> >     > This is ultimately caused by serialize() being non-interruptible. A
> >     > relatively simple way to hang an R session for a long-ish time would
> >     > therefore be:
> >
> >     > f <- xzfile(nullfile(), 'a+b')
> >     > x <- rep(0, 1e9) # approx. 8 gigabytes, adjust for your RAM size
> >     > serialize(x, f)
> >     > close(f)
> >
> >     > This means that calling R_CheckUserInterrupt() between saving
> >     > individual objects is not enough: R also needs to check for interrupts
> >     > while saving sufficiently long vectors.
> >
> >     > Since the serialize() infrastructure is carefully written to avoid
> >     > resource leaks on allocation failures, it looks relatively safe to
> >     > liberally sprinkle R_CheckUserInterrupt() where it makes sense to do
> >     > so, i.e. once per WriteItem() (which calls itself recursively and
> >     > non-recursively) and once per every downstream for loop iteration.
> >     > Valgrind doesn't show any new leaks if I apply the patch, interrupt
> >     > serialize() and then exit. R also passes make check after the applied
> >     > patch.
> >
> >     > Do these changes make sense, or am I overlooking some other problem?
> >
> > Thank you, Ivan!
> >
> > They do make sense... but :
> >
> > OTOH, in the past we have had to *disable*  R_CheckUserInterrupt()
> > in parts of R's code because it was too expensive,
> > {see current src/main/{seq.c,unique.c}  for a series of commented-out
> >  R_CheckUserInterrupt() for such speed-loss reasons}
> >
> > so  adding these may need a lot of care when we simultaneously
> > want to remain  efficient for "morally valid" use of serialization...
> > where we really don't want to pay too much of a premium.
>
> Alternatively, one could consider making R throttle or debounce calls
> to R_CheckUserInterrupt such that a repeated calls within x time are
> ignored, cf: https://www.freecodecamp.org/news/javascript-debounce-example/
>
> The reasoning being that it may be difficult for (contributed) code to
> determine when/where it is appropriate to check for interrupts, given
> varying code paths and cpu speed. Maybe it makes more sense to call
> R_CheckUserInterrupt frequently wherever it is safe to do so, and let
> R decide if reasonable time has elapsed to actually run the (possibly
> expensive) ui check again.
>
> Basic example: https://github.com/r-devel/r-svn/pull/125/files
>
>
>
>
> >
> > {{ saving the whole user workspace is not "valid" in that sense
> >    in my view.  I tell all my (non-beginner) Rstudio-using
> >    students they should turn *off* the automatic saving and
> >    loading at session end / beginning; and for reproducibility
> >    only saveRDS() [or save()] *explicitly* a few precious
> >    objects ..
> > }}
> >
> > Again, we don't want to punish people who know what they are
> > doing, just because other R users manage to hang their R session
> > by too little thinking ...
> >
> > Your patch adds 15 such interrupt checking calls which may
> > really be too much -- I'm not claiming they are: with our
> > recursive objects it's surely not very easy to determine the
> > "minimally necessary" such calls.
> >
> > In addition, we may still consider adding an extra optional
> > argument, say   `check.interrupt = TRUE`
> > which we may default to TRUE when  save.image() is called
> > but e.g., not when serialize() is called..
> >
> > Martin
> >
> >     > --
> >     > Best regards,
> >     > Ivan
> >     > x[DELETED ATTACHMENT external: Rd_IvanKrylov_interrupt-serialize.patch, text/x-patch]
> >     > ______________________________________________
> >     > R-devel using r-project.org mailing list
> >     > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> > ______________________________________________
> > R-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list