Using zip format for help pages, examples, etc.

Prof Brian D Ripley ripley@stats.ox.ac.uk
Thu, 25 Mar 1999 21:03:26 +0000 (GMT)


On 25 Mar 1999, Douglas Bates wrote:

> At the Vienna meeting we discussed the problems encountered on some
> operating systems when storing many small files in a directory.  In
> particular the directories $RHOME/library/base/help/,
> $RHOME/library/base/R-ex/, and $RHOME/library/base/data/ can take up
> an enormous amount of storage on the Macintosh or on Windows systems
> because the minimum amount of storage per distinct file is quite
> large.

Not just base: lme is at least as bad and MASS and boot are large too.

> Fritz Leisch and I suggested storing the contents of each of these
> directories as a single .zip file.  This should result in considerable
> savings in size with little penalty in access speed.
> 
> To implement this we would need code for accessing files within a .zip
> archive and for decompressing the files.  I know of the files from
> Info-Zip including the zlib sources (compression/decompression) and
> the contrib/minizip directory in the zlib source tree.  The minizip
> directory provides a prototype unzip.c and unzip.h.  These would be
> used to modify the R internal function file.exists and file.show so
> they could look in the archive as well as in a directory.
> 
> Does anyone know of a reason why this would not be a good idea?  Does
> anyone know of better|more_portable|whatever implementations of code
> to access files within a .zip archive.  The info-zip sources are
> available at http://www.cdrom.com/pub/infozip/zlib/

Those are the ones I was looking at: they do seem portable, if slow.

However, I am not sure that file.exists and file.show should be 
overloaded, and for data() you need rather more than those. What is needed
I think is a function to extract a file from an archive to a temporary
file, to be based to file.show or to source or load.  There is a snag
in ensuring that temporary files get deleted, but file.show has a flag for
this, and example and data could have an on.exit call to file.remove.

One important point is that AnIndex and 00Titles are in help, and they
probably should be kept uncompressed (not least as the PERL scripts read
them).

It would be easy to mock this up using scripts. Who is doing this?


-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._