[Rd] improving the performance of install.packages

Joshua Bradley jgbr@d|ey1 @end|ng |rom gm@||@com
Fri Nov 8 20:55:30 CET 2019


I could do this...and I have before. This brings up a more fundamental
question though. You're asking me to write code that changes the logic of
the installation process (i.e. writing my own package installer). Instead
of doing that, I would rather integrate that logic into R itself to improve
the baseline installation process. This api proposal change would be
additive and would not break legacy code.

Package managers like pip (python), conda (python), yum (CentOS), apt
(Ubuntu), and apk (Alpine) are all "smart" enough to know (by their
defaults) when to not download a package again. By proposing this change,
I'm essentially asking that R follow some of the same conventions and best
practices that other package managers have adopted over the decades.

I assumed this list is used to discuss proposals like this to the R
codebase. If I'm on the wrong list, please let me know.

P.S. if this change happened, it would be interesting to study the effect
it has on the bandwidth across all CRAN mirrors. A significant drop would
turn into actual $$ saved

Josh Bradley


On Fri, Nov 8, 2019 at 5:00 AM Duncan Murdoch <murdoch.duncan using gmail.com>
wrote:

> On 08/11/2019 2:06 a.m., Joshua Bradley wrote:
> > Hello,
> >
> > Currently if you install a package twice:
> >
> > install.packages("testit")
> > install.packages("testit")
> >
> > R will build the package from source (depending on what OS you're using)
> > twice by default. This becomes especially burdensome when people are
> using
> > big packages (i.e. lots of depends) and someone has a script with:
> >
> > install.packages("tidyverse")
> > ...
> > ... later on down the script
> > ...
> > install.packages("dplyr")
> >
> > In this case, "dplyr" is part of the tidyverse and will install twice. As
> > the primary "package manager" for R, it should not install a package
> twice
> > (by default) when it can be so easily checked. Indeed, many people resort
> > to writing a few lines of code to filter out already-installed packages
> An
> > r-help post from 2010 proposed a solution to improving the default
> > behavior, by adding "force=FALSE" as a api addition to install.packages.(
> > https://stat.ethz.ch/pipermail/r-help/2010-May/239492.html)
> >
> > Would the R-core devs still consider this proposal?
>
> Whether or not they'd do it, it's easy for you to do it.
>
> install.packages <- function(pkgs, ..., force = FALSE) {
>    if (!force) {
>      pkgs <- Filter(Negate(requireNamespace), pkgs
>
>    utils::install.packages(pkgs, ...)
> }
>
> You might want to make this more elaborate, e.g. doing update.packages()
> on the ones that exist.  But really, isn't the problem with the script
> you're using, which could have done a simple test before forcing a slow
> install?
>
> Duncan Murdoch
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list