[Rd] Parallel compression support for saving to rds/rdata files?

Simon Urbanek simon.urbanek at r-project.org
Thu Dec 15 16:43:12 CET 2016


> On Dec 15, 2016, at 12:08 AM, Kenny Bell <kmb56 at berkeley.edu> wrote:
> 
> Hi,
> 
> I have tried to follow the instructions in the ``save`` documentation and
> it doesn't seem to work (see below):
> 
> mydata <- do.call(rbind, rep(iris, 10000))
> con <- pipe("pigz -p8 > fname.gz", "wb");
> save(mydata, file = con); close(con) # This runs
> 
> R.utils::gunzip("fname.gz", "fname.RData", overwrite = TRUE)
> load("fname.RData") # Error: error reading from connection
> 
> First question: Should the above work?
> 


Not really, gzip is a bad example, because it doesn't really support parallel compression (since a gzip stream cannot be chopped into blocks by design), but you can do it with bzip2:

mydata <- do.call(rbind, rep(iris, 10000))
con <- pipe("pbzip2 -p8 > fname.bz2", "wb")
save(mydata, file = con)
close(con) 

load("fname.bz2")

you can also use parallel read:

load(pipe("pbzip2 -dc fname.bz2"))

Cheers,
Simon



> Second question: Is it possible to make this dummy friendly by allowing
> "pigz" as an option for ``compress`` in saveRDS and save? And in such a way
> that the decompressing is hidden from the user like normal?
> 
> Thanks!
> Kenny
> 
> 
> -- 
> Kendon Bell
> Email: kmb56 at berkeley.edu
> Phone: (510) 612-3375
> 
> Ph.D. Candidate
> Department of Agricultural & Resource Economics
> University of California, Berkeley
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 



More information about the R-devel mailing list