[BioC] rhdf5, dataframes, and variable length strings

Bernd Fischer bernd.fischer at embl.de
Wed Nov 13 13:28:10 CET 2013


Dear John!

Thank you very much for reporting this bug. I can reproduce it on my computer,
but it will need some time to fix it. I will let you know, once it is fixed.

Best,

Bernd



On 28.10.2013, at 22:14, John at embl-heidelberg.de wrote:

> 
> Hi all.
> 
> I am working with large data frames in R that contain a mix of numbers and variable-length strings.  I've tried using the rhdf5 package to write and then read these and I haven't been able to figure out how to correctly use the package.  I'll include a toy data frame that causes R to segfault, at least on my machine.  I would greatly appreciate either some pointers about what I'm doing wrong or another way to store my data.
> 
> rndString <- function(n=1){rndString <- c(1:n);for(i in 1:n){rndString[i] <- paste(sample(c(0:9,letters,LETTERS),sample(c(3:20),1),replace=TRUE),collapse="")};return(rndString)}
> library(rhdf5)
> n <- 1000000
> d <- data.frame(id=seq(n),name=rndString(n),val=rnorm(n),stringsAsFactors=FALSE)
> h5createFile("test.h5")
> h5write(d,file="test.h5",name="d")
> dd <- h5read("test.h5",name="d")
> 
> John Estrada
> 
> 
> 
> -- output of sessionInfo(): 
> 
>> sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
> 
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base     
> 
> other attached packages:
> [1] rhdf5_2.6.0
> 
> loaded via a namespace (and not attached):
> [1] zlibbioc_1.8.0
> 
> 
> --
> Sent via the guest posting facility at bioconductor.org.



More information about the Bioconductor mailing list