[BioC] rhdf5 read/write concurrency surprise

Bernd Fischer bernd.fischer at embl.de
Wed Jul 30 20:44:30 CEST 2014


Dear Brad,

I investigated this issue and found that similar issues happen with the 
R file connections. I added example code at the end of this message.
Therefore, I decided to leave the behavior as is for the low-level HDF5
functions (upper case H5… functions and HDF5 object identifiers).

However, for the HDF5 high-level functions (lower case h5…), I added
a warning, if an open HDF5 handle already exists for the specified filename (rhdf5 >= 2.9.5).

In your example, the second call to "h5dump(hf)" should through a warning like:

> h5dump(hf)
named list()
Warnmeldung:
In h5checktypeOrOpenLoc(file) :
  An open HDF5 file handle exists. If the file has changed on disk meanwhile, the function may not work properly. Run 'H5close()' to close all open HDF5 object handles.

and a call to H5close() should solve the problem.

> H5close()
> h5dump(hf)
$foo
[1] 1



On 26.06.2014, at 08:13, Brad Friedman <friedman.brad at gene.com> wrote:

> An open rhdf5 handle becomes corrupted when a another process writes to the hdf5 file.
> 
> This example requires you to start two different R processes, a writer and a reader
> 
> ## Create empty HDF5 file in the writer process
> > library(rhdf5)
> > hf <- "x.hdf5"
> > h5createFile(hf)
> [1] TRUE
> > h5dump(hf)
> named list()
> 
> 
> 
> ## Then, in the reader process open a handle and dump the file
> > library(rhdf5)
> > hf <- "x.hdf5"
> > fid <- H5Fopen(hf)
> > h5dump(fid)
> named list()
> 
> 
> ## Now leave the rhdf5 handle open in the reader and go back to the
> ## writer process and write a data set
> > h5write(1, hf, "foo")
> > h5dump(hf)
> $foo
> [1] 1
> 
> 
> ## Now go back to the reader and try to read it:
> > h5dump(fid)
> named list()
> ## That is not right---it doesn't reflect the change.
> ## Maybe the handle is bad. Try to read it using the filename instead
> > h5dump(hf)
> named list()
> ## still can't see it. Try a new rhdf5 handle
> > fid2 <- H5Fopen(hf)
> > h5dump(fid2)
> named list()
> ## Still can't see it. Turns out if I close all the open rhdf5 handles
> ## I can see it.
> > H5Fclose(fid)
> > H5Fclose(fid2)
> > h5dump(hf)
> $foo
> [1] 1
> 
> 
> 
> A workaround for this is that whenever the file is modified by the writer process, the reader process has to make sure to close all open handles for the file and then reopen fresh ones. Another workaround is to never explicitly open handles with H5Fopen, and to only use rhdf's interface that accepts a file name instead of an open HDF5 handle.
> 
> 
> > sessionInfo()
> R version 3.1.0 Patched (2014-05-17 r65643)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
> 
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
> [1] rhdf5_2.9.1
> 
> loaded via a namespace (and not attached):
> [1] zlibbioc_1.10.0
> 



More information about the Bioconductor mailing list