[R] compressing data without writing output to file

Prof Brian Ripley ripley at stats.ox.ac.uk
Sun Feb 8 07:43:58 CET 2009


What do you want the compressed R object to be?  (It is not an R 
object.)

Omegahat package Rcompression may help you, but it returns a raw 
vector (and that has overheads such as the header: you could use its 
length if appropriate).

On Sat, 7 Feb 2009, Markus Loecher wrote:

> This might seem like a strange question

It is ore than a little imprecise ....

> but is there any way to compress an
> R object (such as a matrix) and know its resulting size in bytes ?
> Clearly, I could implement this in the following way (if x is my matrix):
>      zz <- gzfile(fname,"w");
>      write.table(x,zz);
>      close(zz);
>      file.info(fname)[,"size"];

Hmm, that calcuates the size of a compressed character representation 
of the object.  So do you want the size of an object or of its 
character representation?  object.size() calculated the first.

> However, I need to do this for hundreds of thousands of objects and the
> overhead in terms of disk access due to the actual file creation is
> prohibitive.

The overheads of finding a character representation and of allocating 
an R object for the result would also be large.

> I guess, I would like a modified object.size() function that returns the
> size of the compressed (e.g. gzip) version of the object.

I don't see the pooint of calculating the size of something you will 
not use.  And anything involving 'hundreds of thousands of objects' is 
better done in C code.  So why not just write a C function to do 
whatever it is you really want (but have not told us).

In fact ehe way lazy-loading is implemented is pretty close to what 
you describe -- that uses an on-disk database and it not slow for 
100,000 objects.

> Thanks!
>
> Markus
>
> 	[[alternative HTML version deleted]]

PLEASE do read the posting guide (belatedly) and do not send HTML as 
you were asked.

> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list