[Rd] improving the performance of install.packages

William Dunlap wdun|@p @end|ng |rom t|bco@com
Fri Nov 8 21:06:43 CET 2019


While developing a package, I often run install.packages() on it many times
in a session without updating its version number.  How would your proposed
change affect this workflow?
Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Fri, Nov 8, 2019 at 11:56 AM Joshua Bradley <jgbradley1 using gmail.com> wrote:

> I could do this...and I have before. This brings up a more fundamental
> question though. You're asking me to write code that changes the logic of
> the installation process (i.e. writing my own package installer). Instead
> of doing that, I would rather integrate that logic into R itself to improve
> the baseline installation process. This api proposal change would be
> additive and would not break legacy code.
>
> Package managers like pip (python), conda (python), yum (CentOS), apt
> (Ubuntu), and apk (Alpine) are all "smart" enough to know (by their
> defaults) when to not download a package again. By proposing this change,
> I'm essentially asking that R follow some of the same conventions and best
> practices that other package managers have adopted over the decades.
>
> I assumed this list is used to discuss proposals like this to the R
> codebase. If I'm on the wrong list, please let me know.
>
> P.S. if this change happened, it would be interesting to study the effect
> it has on the bandwidth across all CRAN mirrors. A significant drop would
> turn into actual $$ saved
>
> Josh Bradley
>
>
> On Fri, Nov 8, 2019 at 5:00 AM Duncan Murdoch <murdoch.duncan using gmail.com>
> wrote:
>
> > On 08/11/2019 2:06 a.m., Joshua Bradley wrote:
> > > Hello,
> > >
> > > Currently if you install a package twice:
> > >
> > > install.packages("testit")
> > > install.packages("testit")
> > >
> > > R will build the package from source (depending on what OS you're
> using)
> > > twice by default. This becomes especially burdensome when people are
> > using
> > > big packages (i.e. lots of depends) and someone has a script with:
> > >
> > > install.packages("tidyverse")
> > > ...
> > > ... later on down the script
> > > ...
> > > install.packages("dplyr")
> > >
> > > In this case, "dplyr" is part of the tidyverse and will install twice.
> As
> > > the primary "package manager" for R, it should not install a package
> > twice
> > > (by default) when it can be so easily checked. Indeed, many people
> resort
> > > to writing a few lines of code to filter out already-installed packages
> > An
> > > r-help post from 2010 proposed a solution to improving the default
> > > behavior, by adding "force=FALSE" as a api addition to
> install.packages.(
> > > https://stat.ethz.ch/pipermail/r-help/2010-May/239492.html)
> > >
> > > Would the R-core devs still consider this proposal?
> >
> > Whether or not they'd do it, it's easy for you to do it.
> >
> > install.packages <- function(pkgs, ..., force = FALSE) {
> >    if (!force) {
> >      pkgs <- Filter(Negate(requireNamespace), pkgs
> >
> >    utils::install.packages(pkgs, ...)
> > }
> >
> > You might want to make this more elaborate, e.g. doing update.packages()
> > on the ones that exist.  But really, isn't the problem with the script
> > you're using, which could have done a simple test before forcing a slow
> > install?
> >
> > Duncan Murdoch
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list