[BioC] rhdf5 write/read inconsistency

Brad Friedman [guest] guest at bioconductor.org
Wed Nov 6 15:55:06 CET 2013


I have an example of a matrix which I write with rhdf5 but when I read it back in I get something randomly different from what I wrote.

This example demonstrates the effect. It seems to be related somehow to having small chunks. In the example I write a matrix, then read it back in 10 times, each time printing its sum. It is usually a different sum, and never correct.

library(rhdf5)
go <- function(numRow = blocksize,
               chunksize = 4,
               numCol = 3,
               dims = c(numRow, numCol),
               start = 1,
               blocksize = 7)  {
  str(list(numRow = numRow, numCol = numCol,
           start = start,
           chunksize = chunksize,
           blocksize = blocksize))

  mtx <- matrix(1:(blocksize*numCol), ncol = numCol)
  cat("sum(matrix)=", sum(mtx), "\n")

  file.exists("x.hdf5") && unlink("x.hdf5")
  h5createFile("x.hdf5")
  h5createDataset(file="x.hdf5",
                  dataset = "x",
                  dims = dims,
                  H5type = "H5T_NATIVE_UINT32",
                  level = 0,
                  chunk= c(chunksize,numCol))

  h5write(mtx, "x.hdf5", name = "x",
          start = c(start, 1),
          stride = c(1,1),
          block = c(blocksize, numCol),
          count= c(1,1))

  {
    for(i in 1:10)
      print(sum(h5read("x.hdf5", "/x",
                       start = c(start, 1),
                       stride = c(1,1),
                       block = c(blocksize, numCol),
                       count= c(1,1))))
  }
}


##### and the transcript:



> go()
List of 5
 $ numRow   : num 7
 $ numCol   : num 3
 $ start    : num 1
 $ chunksize: num 4
 $ blocksize: num 7
sum(matrix)= 231 
[1] 209
[1] 47358985
[1] 234
[1] 42963065
[1] 46236113
[1] 48574193
[1] 11738297
[1] 11738297
[1] 11738297
[1] 193


 -- output of sessionInfo(): 

R version 3.0.1 (2013-05-16)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rhdf5_2.5.7

loaded via a namespace (and not attached):
[1] zlibbioc_1.6.0


--
Sent via the guest posting facility at bioconductor.org.



More information about the Bioconductor mailing list