[Rd] Bug in memDecompress()

Olaf Mersmann olafm at kimberly.tako.de
Fri May 7 17:27:39 CEST 2010


Dear R developers,

I have discovered a bug in the implementation of lzma decompression in memDecompress(). It is only triggered if the uncompressed size of the content is more than 3 times as large as the compressed content. Here's a simple example to reproduce it:

  n <- 200
  
  char <- paste(replicate(n, "1234567890"), collapse="")
  char.comp <- memCompress(char, type="xz")
  char.dec <- memDecompress(char.comp, type="xz", asChar=TRUE)
  nchar(char.dec) == nchar(char)

  raw <- serialize(char, connection=NULL)
  raw.comp <- memCompress(raw, type="xz")
  raw.dec <- memDecompress(raw.comp, type="xz")
  length(raw.dec) == length(raw)

  char.uns <- unserialize(raw.dec)

The root cause seems to be, that lzma_code() will return LZMA_OK even if it could not decompress the whole content. In this case strm.avail_in will be greater than zero. The following patch changes the respective if statements:

  http://www.statistik.tu-dortmund.de/~olafm/temp/memdecompress.patch

It also contains a small fix from the xz upstream for an uninitialized field in lzma_stream.

Cheers,
Olaf



More information about the R-devel mailing list