[Rd] CRAN package sizes

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue Feb 15 10:40:39 CET 2011


On Sun, 13 Feb 2011, Yihui Xie wrote:

> Regarding the reasons that make the doc directory large, I wonder if
> we can make some changes in R:

'we' cannot: only core developers can.  However, end users can 
contribute in many other ways: see below.

> 1. Use a null graphics device as the default device rather than pdf()
> when running Sweave -- this can avoid the useless Rplots.pdf:
>
> options(device = function(...) {
>    .Call("R_GD_nullDevice", PACKAGE = "grDevices")
> })
>
> This can save some time in building the vignette(s) as well. (see
> http://yihui.name/en/?p=673)
>
> However, this undocumented null device may not work for certain
> graphics. Here is an example that it fails for ggplot2:
> http://stackoverflow.com/questions/4692974/ggplot2-code-that-works-interactively-rkward-crashes-under-lyx-pgfsweave-hint/4707745#4707745
>
> Is it possible for someone to look into the null device (Dr Murrell?)
> to make it stable enough?

I don't see a bug report on that, and a patch would help expedite 
this.

> 2. Compress the PDF graphics and vignettes using third-party tools,
> among which I recommend qpdf (it's free).
>
> qpdf --stream-data=compress input.pdf output.pdf
>
> This can reduce the size of PDF files a lot without quality loss. I'm
> using this tool in the animation package to reduce the size of PDF
> animations.

*Can*, but I did say

   'There are several ways to reduce the sizes of PDFs with no loss in
    quality, e.g. Adobe Acrobat Standard/Pro.'

and qpdf is often ineffective (or worse), e.g. on package mokken.  The 
problem is that many of the large packages need images re-saved in 
some other format (or preferably re-generated in some other format).

I've added a --compact-vignettes option to R CMD build (in R-devel). 
At present it uses qpdf, but I will look out for better/additional 
options.  (I use Acrobat 9 Pro on my Mac and that has always beaten 
qpdf, often by a large margin.  But qpdf is perhaps the most readily 
available of these tools.)

> 3. Sorry I bring up this issue again, but I don't understand why
> Sweave could not implement the png() device along with pdf() and
> postscript(). I'm willing to provide a patch if needed.

Does it need changes to R?  I believe that it just needs a 
different driver, something which could be provided in a package.

This has been raised several times (including recently) with the 
Sweave maintainer, so maybe it will happpen eventually.  But a package 
would retrofit it to eariier versions of R.


>
> Thanks!
>
> Regards,
> Yihui
> --
> Yihui Xie <xieyihui at gmail.com>
> Phone: 515-294-2465 Web: http://yihui.name
> Department of Statistics, Iowa State University
> 2215 Snedecor Hall, Ames, IA
>
>
>
> On Sun, Feb 13, 2011 at 6:30 AM, Prof Brian Ripley
> <ripley at stats.ox.ac.uk> wrote:
>> Robin Hankin's post reminded me to post about the following recent addition
>> to 'Writing R Extensions', in the section on 'Submitting a package to CRAN'
>>
>>  Ensure that the package sources are not unnecessarily large. ...
>>  As a general rule, doc directories should not exceed 5Mb, and
>>  where data directories need to be 10Mb or more, consideration should
>>  be given to a separate package containing just the data. (Similarly
>>  for external data directories, large jar files and other libraries
>>  that need to be installed.)
>>
>> With 2800 packages on CRAN, overall size is becoming a concern and currently
>> to install all of CRAN takes 4Gb.  As the attached (I hope) graph shows, the
>> 20 packages over 20Mb take a quarter, and those over 5Mb take half.  (And
>> this is after we have removed 100Mb from the largest installed package by
>> re-compression, and archived the second largest, so Robin's package is
>> currently the largest.)  Some of the largest packages are data/jar packages,
>> but there are 55 packages with 'doc' directories over 5Mb.  To put that in
>> perspective, PDFs of whole books with lots of figures (MASS, Paul's R
>> Graphics) are well under 5Mb.
>>
>> R CMD check in R-devel reports on large packages, and expect in future that
>> submitted package sizes will be questioned more often.
>>
>> There are lots of different reasons why doc directories are large, but the
>> major ones are
>>
>> - installing files that are unneeded, such as Rplots.pdf and .eps
>>  figures.
>> - using PDF figures of images where PNG would be more appropriate.
>> - including less than relevant material (such as how to install R,
>>  with screenshots!)
>>
>> There are several ways to reduce the sizes of PDFs with no loss in quality,
>> e.g. Adobe Acrobat Standard/Pro.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


More information about the R-devel mailing list