[R] gzfile() produces large files

Prof Brian D Ripley ripley at stats.ox.ac.uk
Thu Jun 28 00:44:04 CEST 2001


Here are some more experiments:

zz <- gzfile("t1.gz", "w")
write(1:1000, zz)
close(zz)

zz <- gzfile("t2.gz", "w")
writeLines(as.character(1:1000), zz)
close(zz)

zz <- gzfile("t3.gz", "w")
writeBin(1:1000, zz)
close(zz)

zz <- textConnection("out", "w")
write(1:1000, zz)
close(zz)
zz <- gzfile("t4.gz", "w")
writeLines(out, zz)
close(zz)

ls -l
-rw-r--r--    1 ripley   Administ    15913 Jun 27 23:20 t1.gz
-rw-r--r--    1 ripley   Administ     1848 Jun 27 23:20 t2.gz
-rw-r--r--    1 ripley   Administ     1434 Jun 27 23:20 t3.gz
-rw-r--r--    1 ripley   Administ     1856 Jun 27 23:20 t4.gz

All are 3893 bytes uncompressed except t3, which is 4000.  The problem with
the first is that it writes in very small pieces,

1 \n 2 \n 3 \n 4 \n  ...

and as the output is trying for no latency, it has too little
opportunity to compress.

The moral seems to be to write to gzfile connections in moderately-sized
pieces.  It's the one-byte carriage returns that really do the damage here.



On Wed, 27 Jun 2001, Prof Brian Ripley wrote:
> On Wed, 27 Jun 2001, Uwe Ligges wrote:
>
> > I observed some strange results playing around with gzfile() [R-1.3.0,
> > WinNT 4.0]:
> >
> > At first
> >
> >   x <- 1:1000
> >   write(x, file = "c:/temp.txt")
> >
> > results in a file of about 4 kB. But
> >
> >   my.con <- gzfile("c:/temp.gz", open = "w")
> >   write(x, file = my.con)
> >   close(my.con)
> >
> > results in a file of about 16 kB.
> >
> > I expected a reduction of the size. Anyone who can tell me what went
> > wrong?
>
> My experiments concur: I do get a 15913 byte file and it is a valid gzip
> file.
>
> I've used this much more to read compressed files than write them.
> I will take a closer look at the zlib specs when I have time.
>
> Brian
>
> --
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272860 (secr)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list