[Rd] Decompressing raw vectors in memory

Hadley Wickham hadley at rice.edu
Wed May 2 18:27:04 CEST 2012


> Well, it seems what you get there depends on the client, but I did
>
> tystie% curl -o foo "http://httpbin.org/gzip"
> tystie% file foo
> foo: gzip compressed data, last modified: Wed May  2 17:06:24 2012, max
> compression
>
> and the final part worried me: I do not know if memDecompress() knows about
> that format.  The help page does not claim it can do anything other than
> de-compress the results of memCompress() (although past experience has shown
> that it can in some cases).  gzfile() supports a much wider range of
> formats.

Ah, ok.  Thanks.  Then in that case it's probably just as easy to save
it to a temp file and read that.

  con <- file(tmp) # R automatically detects compression
  open(con, "rb")
  on.exit(close(con), TRUE)

  readBin(con, raw(), file.info(tmp)$size * 10)

The only challenge is figuring out what n to give readBin. Is there a
good general strategy for this?  Guess based on the file size and then
iterate until result of readBin has length less than n?

  n <- file.info(tmp)$size * 2
  content <- readBin(con, raw(),  n)
  n_read <- length(content)
  while(n_read == n) {
    more <- readBin(con, raw(),  n)
    content <- c(content, more)
    n_read <- length(more)
  }

Which is not great style, but there shouldn't be many reads.

Hadley


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/



More information about the R-devel mailing list