[Rd] improving the performance of install.packages

Duncan Murdoch murdoch@dunc@n @end|ng |rom gm@||@com
Fri Nov 8 22:59:01 CET 2019


On 08/11/2019 2:55 p.m., Joshua Bradley wrote:
> I could do this...and I have before. This brings up a more fundamental
> question though. You're asking me to write code that changes the logic of
> the installation process (i.e. writing my own package installer). Instead
> of doing that, I would rather integrate that logic into R itself to improve
> the baseline installation process. This api proposal change would be
> additive and would not break legacy code.

That's not true.  The current behaviour is equivalent to force=TRUE; I 
believe the proposal was to change the default to force=FALSE.

If you didn't change the default, it wouldn't help your example:  the 
badly written script would run with force=TRUE, and wouldn't benefit at all.

Duncan Murdoch

> 
> Package managers like pip (python), conda (python), yum (CentOS), apt
> (Ubuntu), and apk (Alpine) are all "smart" enough to know (by their
> defaults) when to not download a package again. By proposing this change,
> I'm essentially asking that R follow some of the same conventions and best
> practices that other package managers have adopted over the decades.
> 
> I assumed this list is used to discuss proposals like this to the R
> codebase. If I'm on the wrong list, please let me know.
> 
> P.S. if this change happened, it would be interesting to study the effect
> it has on the bandwidth across all CRAN mirrors. A significant drop would
> turn into actual $$ saved
> 
> Josh Bradley
> 
> 
> On Fri, Nov 8, 2019 at 5:00 AM Duncan Murdoch <murdoch.duncan using gmail.com>
> wrote:
> 
>> On 08/11/2019 2:06 a.m., Joshua Bradley wrote:
>>> Hello,
>>>
>>> Currently if you install a package twice:
>>>
>>> install.packages("testit")
>>> install.packages("testit")
>>>
>>> R will build the package from source (depending on what OS you're using)
>>> twice by default. This becomes especially burdensome when people are
>> using
>>> big packages (i.e. lots of depends) and someone has a script with:
>>>
>>> install.packages("tidyverse")
>>> ...
>>> ... later on down the script
>>> ...
>>> install.packages("dplyr")
>>>
>>> In this case, "dplyr" is part of the tidyverse and will install twice. As
>>> the primary "package manager" for R, it should not install a package
>> twice
>>> (by default) when it can be so easily checked. Indeed, many people resort
>>> to writing a few lines of code to filter out already-installed packages
>> An
>>> r-help post from 2010 proposed a solution to improving the default
>>> behavior, by adding "force=FALSE" as a api addition to install.packages.(
>>> https://stat.ethz.ch/pipermail/r-help/2010-May/239492.html)
>>>
>>> Would the R-core devs still consider this proposal?
>>
>> Whether or not they'd do it, it's easy for you to do it.
>>
>> install.packages <- function(pkgs, ..., force = FALSE) {
>>     if (!force) {
>>       pkgs <- Filter(Negate(requireNamespace), pkgs
>>
>>     utils::install.packages(pkgs, ...)
>> }
>>
>> You might want to make this more elaborate, e.g. doing update.packages()
>> on the ones that exist.  But really, isn't the problem with the script
>> you're using, which could have done a simple test before forcing a slow
>> install?
>>
>> Duncan Murdoch
>>
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



More information about the R-devel mailing list