[Rd] Fixing the HDF5 package: the on.exit mystery

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri Mar 4 22:12:20 CET 2011


Looks like you should be using finalizers instead.  See the RODBC 
package for an example of this.

On Fri, 4 Mar 2011, H C Pumphrey wrote:

> Dear all,
>
> I'm trying to fix a subtle bug in the hdf5 package. This package provides an 
> interfaces to the HDF5 library and hence allows one to load data into R from 
> files in the HDF5 format. The bug appeared during a period in which R changed 
> but the package did not.
>
> I include below both the R and C code, stripped of everything except what is 
> needed to show the bug. What is supposed to happen is
>
> (*) the user calls R function hdf5load()
> (*) hdf5load() calls C function do_hdf5load()
> (*) do_hdf5load() opens the HDF5 file recording its HDF5 file id (fid)
> (*) do_hdf5load() calls C function setup_onexit, passing fid to it
> (*) setup_onexit sets up the on.exit call to be R function hdf5cleanup with 
> fid as its argument
> (*)  C function do_hdf5load() walks the HDF5 file's tree structure of groups 
> of groups of [...] of datasets, mapping them to an R list of lists of [...] 
> of array variables. This recursive procedure may have a variety of exit 
> points buried inside itself.
> (*) C function do_hdf5load() exits for some reason. R function hdf5load() 
> therefore exits but before doing so it calls its on.exit code (which is 
> hdf5cleanup(fid) with the right value of fid), closing the file.
>
> The problem is that when do_hdf5load() and hdf5load() exit, hdf5cleanup() is 
> usually not called, meaning that the file is left open. You might not notice 
> this, but if you are processing a few year's worth of data, which is stored 
> at 1 file per day, you may end up with the system limit number of files open 
> and be unable to open any more.
>
> I have a suspicion that the problem dates to a change in R at 2.8.0. If you 
> do  help(on.exit) it notes under "Details" that: "Where ‘expr’ was evaluated 
> changed in R 2.8.0 ..." But it is not clear how I should modify the C code to 
> force hdf5cleanup() to be reliably called when do_hdf5load() exits.
> Any help appreciated.
>
> Hugh (possibly the nearest thing to a maintainer that the hdf5 package 
> currently has)
>
> (R and C code follow)
>
> #----------------------------------------------------------------
> "hdf5load" <-  function (file, load = TRUE, verbosity = 0, tidy = FALSE)
> {
>  call <- sys.call()
>  .External("do_hdf5load", call, sys.frame(sys.parent()), file, load,
>            as.integer (verbosity), as.logical(tidy),
>            PACKAGE="hdf5")
> }
>
> "hdf5cleanup" <- function (fid)
> {
>  call <- sys.call()
>  print("In hdf5cleanup: calling do_hdf5cleanup")
>  invisible(.External("do_hdf5cleanup", call, sys.frame(sys.parent()), fid,
>            PACKAGE="hdf5"))
> }
> #----------------------------------------------------------------
>
>
> /*---------------------------------------------------------------*/
> SEXP do_hdf5load (SEXP args)
> {
> /* Code to process args snipped */
> if ((fid = H5Fopen (path, H5F_ACC_RDONLY, H5P_DEFAULT)) < 0)
>    errorcall (call, "unable to open HDF file: %s", path);
>
>  setup_onexit (fid, env);
>  /* Messy code to walk tree structure of file snipped */
> }
>
> /* The following function shown in its entirety */
> setup_onexit (hid_t fid, SEXP env)
> {
>  eval (lang2 (install ("on.exit"),
>               lang2 (install ("hdf5cleanup"),
>                      ScalarInteger (fid))),
>        env);
> }
>
> SEXP
> do_hdf5cleanup (SEXP args)
> {
> /* Code to process args snipped */
> /* various cleanup things done including this: */
> H5Fclose(fid)
> }
> /*---------------------------------------------------------------*/


-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


More information about the R-devel mailing list