[R] How to checkpoint-restart R jobs in batch mode?

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue Oct 14 16:29:05 CEST 2008

On Tue, 14 Oct 2008, Mizanur Khondoker wrote:

> Dear list,
> Most high performance computing clusters/grid engines  have some
> restrictions on how long a job can be run in batch mode.
> The cluster I am using has maximum of 48 hours limit, but my job would take
> far more than that.
> I know that it is possible to checkpoint jobs without modifying the code if
> some specialized software (e.g., BLCR ) is installed on the grid engine.
> However, I am looking for a solution when this kind of facility is not
> available on the cluster,  for example , by modifying the code so that the
> job can checkpoint and restart by itself.
> Does anyone have any  experience or idea of doing so? Any help would be
> greatly appreciated.

Yes, we've done this for many years, generally by saving the workspace 
every few hours (in our case say every 100 simulation runs), and making 
sure that the workspace contains enough information to restart at the save 
points.  This approach does depend on the run coming back to a simply 
reproducible point fairly often: if it is a simulation running entirely in 
C++ code in a package you have little hope.

> -- 
> Mizanur Khondoker
> Division of Pathway Medicine (DPM)
> The University of Edinburgh Medical School
> The Chancellor's Building
> 49 Little France Crescent
> Edinburgh EH16 4SB
> United Kingdom
> Tel:  +44 (0) 131 242 6287
> Fax: +44 (0) 131 242 6244
> http://www.pathwaymedicine.ed.ac.uk/
> 	[[alternative HTML version deleted]]
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

More information about the R-help mailing list