[R] hdf5 package segfault when processing large data

William Dunlap wdunlap at tibco.com
Mon Aug 24 19:41:28 CEST 2009


> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Budi Mulyono
> Sent: Monday, August 24, 2009 3:38 AM
> To: r-help at r-project.org
> Subject: [R] hdf5 package segfault when processing large data
> 
> Hi there,
> 
> I am currently working on something that uses hdf5 library. I think
> hdf5 is a great data format, I've used it somewhat extensively in
> python via PyTables. I was looking for something similar to that in R.
> The closest I can get is this library: hdf5. While it does not work
> the same way as PyTables did, but it's good enough to let them
> exchange data via hdf5 file.
> 
> There is just 1 problem, I keep getting Segfault error when trying to
> process large files (>10MB), although this is by no mean large when we
> talk about hdf5 capabilities. I have included the example code and
> data below. I have tried with different OS (WinXP and Ubuntu 8.04),
> architecture (32 and 64bit) and R versions (2.7.1, 2.72, and 2.9.1),
> but all of them present the same problem. I was wondering if anyone
> have any clue as to what's going on here and maybe can advice me to
> handle it.

This sort of problem should be sent to the package's maintainer.
   > packageDescription("hdf5")
   Package: hdf5
   Version: 1.6.9
   Title: HDF5
   Author: Marcus G. Daniels mdaniels at lanl.gov
   Maintainer: Marcus G. Daniels <mdaniels at lanl.gov>
   Description: Interface to the NCSA HDF5 library
   ...

This is probably due to the code in hdf5.c allocating a huge
matrix, buf, on the stack with

    883           unsigned char buf[rowcount][size];

It dies with the segmentatio fault (stack overflow, in particular)
at line 898, where it tries to access this buf.

    885           for (ri = 0; ri < rowcount; ri++)
    886             for (pos = 0; pos < colcount; pos++)
    887               {
    888                 SEXP item = VECTOR_ELT (val, pos);
    889                 SEXPTYPE type = TYPEOF (item);
    890                 void *ptr = &buf[ri][offsets[pos]];
    891
    892                 switch (type)
    893                   {
    894                   case REALSXP:
    895                     memcpy (ptr, &REAL (item)[ri], sizeof
(double));
    896                     break;
    897                   case INTSXP:
    898                     memcpy (ptr, &INTEGER (item)[ri], sizeof
(int));
    899                     break;

The code should use one of the allocators in the R API instead
of putting the big memory block on the stack.

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com  

> 
> Thank you, appreciate any help i can get.
> 
> Cheers,
> 
> Budi
> 
> The example script
> ====================
> library(hdf5)
> fileName <- "sample.txt"
> myTable <- read.table(fileName,header=TRUE,sep="\t",as.is=TRUE)
> hdf5save("test.hdf", "myTable")
> 
> ========
> The data example, the list continue for more than 250,000 
> rows: sample.txt
> ========
> Date	Time	f1	f2	f3	f4	f5
> 20070328	07:56	463	463.07	462.9	463.01	1100
> 20070328	07:57	463.01	463.01	463.01	463.01	200
> ....
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 




More information about the R-help mailing list