[Rd] Fixing the HDF5 package: the on.exit mystery

H C Pumphrey H.C.Pumphrey at ed.ac.uk
Fri Mar 4 11:35:08 CET 2011


Dear all,

I'm trying to fix a subtle bug in the hdf5 package. This package provides an 
interfaces to the HDF5 library and hence allows one to load data into R from 
files in the HDF5 format. The bug appeared during a period in which R changed 
but the package did not.

I include below both the R and C code, stripped of everything except what is 
needed to show the bug. What is supposed to happen is

(*) the user calls R function hdf5load()
(*) hdf5load() calls C function do_hdf5load()
(*) do_hdf5load() opens the HDF5 file recording its HDF5 file id (fid)
(*) do_hdf5load() calls C function setup_onexit, passing fid to it
(*) setup_onexit sets up the on.exit call to be R function hdf5cleanup with 
fid as its argument
(*)  C function do_hdf5load() walks the HDF5 file's tree structure of groups 
of groups of [...] of datasets, mapping them to an R list of lists of [...] of 
array variables. This recursive procedure may have a variety of exit points 
buried inside itself.
(*) C function do_hdf5load() exits for some reason. R function hdf5load() 
therefore exits but before doing so it calls its on.exit code (which is 
hdf5cleanup(fid) with the right value of fid), closing the file.

The problem is that when do_hdf5load() and hdf5load() exit, hdf5cleanup() is 
usually not called, meaning that the file is left open. You might not notice 
this, but if you are processing a few year's worth of data, which is stored at 
1 file per day, you may end up with the system limit number of files open and 
be unable to open any more.

I have a suspicion that the problem dates to a change in R at 2.8.0. If you do 
  help(on.exit) it notes under "Details" that: "Where ‘expr’ was evaluated 
changed in R 2.8.0 ..." But it is not clear how I should modify the C code to 
force hdf5cleanup() to be reliably called when do_hdf5load() exits.

Any help appreciated.

Hugh (possibly the nearest thing to a maintainer that the hdf5 package 
currently has)

(R and C code follow)

#----------------------------------------------------------------
"hdf5load" <-  function (file, load = TRUE, verbosity = 0, tidy = FALSE)
{
   call <- sys.call()
   .External("do_hdf5load", call, sys.frame(sys.parent()), file, load,
             as.integer (verbosity), as.logical(tidy),
             PACKAGE="hdf5")
}

"hdf5cleanup" <- function (fid)
{
   call <- sys.call()
   print("In hdf5cleanup: calling do_hdf5cleanup")
   invisible(.External("do_hdf5cleanup", call, sys.frame(sys.parent()), fid,
             PACKAGE="hdf5"))
}
#----------------------------------------------------------------


/*---------------------------------------------------------------*/
SEXP do_hdf5load (SEXP args)
{
/* Code to process args snipped */
  if ((fid = H5Fopen (path, H5F_ACC_RDONLY, H5P_DEFAULT)) < 0)
     errorcall (call, "unable to open HDF file: %s", path);

   setup_onexit (fid, env);
   /* Messy code to walk tree structure of file snipped */
}

/* The following function shown in its entirety */
setup_onexit (hid_t fid, SEXP env)
{
   eval (lang2 (install ("on.exit"),
                lang2 (install ("hdf5cleanup"),
                       ScalarInteger (fid))),
         env);
}

SEXP
do_hdf5cleanup (SEXP args)
{
/* Code to process args snipped */
/* various cleanup things done including this: */
H5Fclose(fid)
}
/*---------------------------------------------------------------*/

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



More information about the R-devel mailing list