[Rd] Is it possible to increase MAX_NUM_DLLS in future R releases?

Henrik Bengtsson henrik.bengtsson at gmail.com
Wed May 11 02:44:51 CEST 2016


Isn't the problem in Qin's example that unloadNamespace("scde") only
unloads 'scde' but none of its package dependencies that were loaded
when 'scde' was loaded.  For example:

$ R --vanilla
> ns0 <- loadedNamespaces()
> dlls0 <- getLoadedDLLs()

> packageDescription("scde")[c("Depends", "Imports")]
$Depends
[1] "R (>= 3.0.0), flexmix"

$Imports
[1] "Rcpp (>= 0.10.4), RcppArmadillo (>= 0.5.400.2.0), mgcv, Rook, rjson, MASS,
 Cairo, RColorBrewer, edgeR, quantreg, methods, nnet, RMTstat, extRemes, pcaMet
hods, BiocParallel, parallel"

> loadNamespace("scde")
> ns1 <- loadedNamespaces()
> dlls1 <- getLoadedDLLs()

> nsAdded <- setdiff(ns1, ns0)
> nsAdded
 [1] "flexmix"       "Rcpp"          "edgeR"         "splines"
 [5] "BiocGenerics"  "MASS"          "BiocParallel"  "scde"
 [9] "lattice"       "rjson"         "brew"          "RcppArmadillo"
[13] "minqa"         "distillery"    "car"           "tools"
[17] "Rook"          "Lmoments"      "nnet"          "parallel"
[21] "pbkrtest"      "RMTstat"       "grid"          "Biobase"
[25] "nlme"          "mgcv"          "quantreg"      "modeltools"
[29] "MatrixModels"  "lme4"          "Matrix"        "nloptr"
[33] "RColorBrewer"  "extRemes"      "limma"         "pcaMethods"
[37] "stats4"        "SparseM"       "Cairo"

> dllsAdded <- setdiff(names(dlls1), names(dlls0))
> dllsAdded
 [1] "Cairo"         "parallel"      "limma"         "edgeR"
 [5] "MASS"          "rjson"         "Rcpp"          "grid"
 [9] "lattice"       "Matrix"        "SparseM"       "quantreg"
[13] "nnet"          "nlme"          "mgcv"          "Biobase"
[17] "pcaMethods"    "splines"       "minqa"         "nloptr"
[21] "lme4"          "extRemes"      "RcppArmadillo" "tools"
[25] "Rook"          "scde"


If you unload these namespaces, I think the DLLs will also be
detached; or at least they should if packages implement an .onUnload()
with a dyn.unload().  More on this below.


To unloading these added namespaces (with DLLs), they have to be
unloaded in an order that does not break the dependency graph of the
currently loaded packages, because otherwise you'll get errors such
as:

> unloadNamespace("quantreg")
Error in unloadNamespace("quantreg") :
  namespace 'quantreg' is imported by 'car', 'scde' so cannot be unloaded

I don't know if there exist a function that unloads the namespaces in
the proper order, but here is a brute-force version:

unloadNamespaces <- function(ns, ...) {
  while (length(ns) > 0) {
    ns0 <- loadedNamespaces()
    for (name in ns) {
      try(unloadNamespace(name), silent=TRUE)
    }
    ns1 <- loadedNamespaces()
    ## No namespace was unloaded?
    if (identical(ns1, ns0)) break
    ns <- intersect(ns, ns1)
  }
  if (length(ns) > 0) stop("Failed to unload namespace: ",
paste(sQuote(ns), collapse=", "))
} # unloadNamespaces()


When I run the above on R 3.3.0 patched on Windows, I get:

> unloadNamespaces(nsAdded)
now dyn.unload("C:/Users/hb/R/win-library/3.3/scde/libs/x64/scde.dll") ...
> ns2 <- loadedNamespaces()
> dlls2 <- getLoadedDLLs()
> ns2
[1] "grDevices" "utils"     "stats"     "datasets"  "base"      "graphics"
[7] "methods"
> identical(sort(ns2), sort(ns0))
[1] TRUE


However, there are some namespaces for which the DLLs are still loaded:

> sort(setdiff(names(dlls2), names(dlls0)))
 [1] "Cairo"         "edgeR"         "extRemes"      "minqa"
 [5] "nloptr"        "pcaMethods"    "quantreg"      "Rcpp"
 [9] "RcppArmadillo" "rjson"         "Rook"          "SparseM"


If we look for .onUnload() in packages that load DLLs, we find that
the following does not have an .onUnload() and therefore probably does
neither call dyn.unload() when the package is unloaded:

> sort(dllsAdded[!sapply(dllsAdded, FUN=function(pkg) {
+   ns <- getNamespace(pkg)
+   exists(".onUnload", envir=ns, inherits=FALSE)
+ })])
 [1] "Cairo"         "edgeR"         "extRemes"      "minqa"
 [5] "nloptr"        "pcaMethods"    "quantreg"      "Rcpp"
 [9] "RcppArmadillo" "rjson"         "Rook"          "SparseM"


That doesn't look like a coincident to me.  Maybe `R CMD check` should
in addition to checking that the namespace of a package can be
unloaded also assert that it unloads whatever DLL a package loads.
Something like:

* checking whether the namespace can be unloaded cleanly ... WARNING
  Unloading the namespace does not unload DLL

At least I don't think this is tested for, e.g.
https://cran.r-project.org/web/checks/check_results_Cairo.html and
https://cran.r-project.org/web/checks/check_results_Rcpp.html.

/Henrik


On Mon, May 9, 2016 at 11:57 PM, Martin Maechler
<maechler at stat.math.ethz.ch> wrote:
>>>>>> Qin Zhu <qinzhu at outlook.com>
>>>>>>     on Fri, 6 May 2016 11:33:37 -0400 writes:
>
>     > Thanks for all your great answers.
>     > The app I’m working on is indeed an exploratory data analysis tool for gene expression, which requires a bunch of bioconductor packages.
>
>     > I guess for now, my best solution is to divide my app into modules and load/unload packages as the user switch from one module to another.
>
>     > This brought me another question: it seems that unload package with the detach/unloadNamespace function does not unload the DLLs, or in the case of the "SCDE" package, not all dependent DLLs:
>
>     >> length(getLoadedDLLs())
>     > [1] 9
>     >> requireNamespace("scde")
>     > Loading required namespace: scde
>     >> length(getLoadedDLLs())
>     > [1] 34
>     >> unloadNamespace("scde")
>     > now dyn.unload("/Library/Frameworks/R.framework/Versions/3.3/Resources/library/scde/libs/scde.so") ...
>     >> length(getLoadedDLLs())
>     > [1] 33
>
>     > Does that mean I should use dyn.unload to unload whatever I think is associated with that package when the user’s done using it? I’m a little nervous about this because this seems to be OS dependent and previous versions of my app are running on both windows and macs.
>
> Hmm, I thought that  dyn.unload() would typically work on all
> platforms, but did not research the question now, and am happy
> to learn more by being corrected.
>
> Even if we increase MAX_NUM_DLL in the future, a considerable
> portion your app's will not use that future version of R yet,
> and so you should try to "fight" the problem now.
>
>     > Any suggestions would be appreciated, and I’d appreciate if the MAX_NUM_DLLS can be increased.
>
>     > Thanks,
>     > Qin
>
>
>     >> On May 4, 2016, at 9:17 AM, Martin Morgan <martin.morgan at roswellpark.org> wrote:
>     >>
>     >>
>     >>
>     >> On 05/04/2016 05:15 AM, Prof Brian Ripley wrote:
>     >>> On 04/05/2016 08:44, Martin Maechler wrote:
>     >>>>>>>>> Qin Zhu <qinzhu at outlook.com>
>     >>>>>>>>> on Mon, 2 May 2016 16:19:44 -0400 writes:
>     >>>>
>     >>>> > Hi,
>     >>>> > I’m working on a Shiny app for statistical analysis. I ran into
>     >>>> this "maximal number of DLLs reached" issue recently because my app
>     >>>> requires importing many other packages.
>     >>>>
>     >>>> > I’ve posted my question on stackoverflow
>     >>>> (http://stackoverflow.com/questions/36974206/r-maximal-number-of-dlls-reached
>     >>>> <http://stackoverflow.com/questions/36974206/r-maximal-number-of-dlls-reached>).
>     >>>>
>     >>>>
>     >>>> > I’m just wondering is there any reason to set the maximal
>     >>>> number of DLLs to be 100, and is there any plan to increase it/not
>     >>>> hardcoding it in the future? It seems many people are also running
>     >>>> into this problem. I know I can work around this problem by modifying
>     >>>> the source, but since my package is going to be used by other people,
>     >>>> I don’t think this is a feasible solution.
>     >>>>
>     >>>> > Any suggestions would be appreciated. Thanks!
>     >>>> > Qin
>     >>>>
>     >>>> Increasing that number is of course "possible"... but it also
>     >>>> costs a bit (adding to the fixed memory footprint of R).
>     >>>
>     >>> And not only that.  At the time this was done (and it was once 50) the
>     >>> main cost was searching DLLs for symbols.  That is still an issue, and
>     >>> few packages exclude their DLL from symbol search so if symbols have to
>     >>> searched for a lot of DLLs will be searched.  (Registering all the
>     >>> symbols needed in a package avoids a search, and nowadays by default
>     >>> searches from a namespace are restricted to that namespace.)
>     >>>
>     >>> See
>     >>> https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Registering-native-routines
>     >>> for some further details about the search mechanism.
>     >>>
>     >>>> I did not set that limit, but I'm pretty sure it was also meant
>     >>>> as reminder for the useR to "clean up" a bit in her / his R
>     >>>> session, i.e., not load package namespaces unnecessarily. I
>     >>>> cannot yet imagine that you need > 100 packages | namespaces
>     >>>> loaded in your R session. OTOH, some packages nowadays have a
>     >>>> host of dependencies, so I agree that this at least may happen
>     >>>> accidentally more frequently than in the past.
>     >>>
>     >>> I am not convinced that it is needed.  The OP says he imports many
>     >>> packages, and I doubt that more than a few are required at any one time.
>     >>> Good practice is to load namespaces as required, using requireNamespace.
>     >>
>     >> Extensive package dependencies in Bioconductor make it pretty easy to end up with dozen of packages attached or loaded. For instance
>     >>
>     >> library(GenomicFeatures)
>     >> library(DESeq2)
>     >>
>     >> > length(loadedNamespaces())
>     >> [1] 63
>     >> > length(getLoadedDLLs())
>     >> [1] 41
>     >>
>     >> Qin's use case is a shiny app, presumably trying to provide relatively comprehensive access to a particular domain. Even if the app were to load / requireNamespace() (this requires considerable programming discipline to ensure that the namespace is available on all programming paths where it is used), it doesn't seem at all improbable that the user in an exploratory analysis would end up accessing dozens of packages with orthogonal dependencies. This is also the use case with Karl Forner's post https://stat.ethz.ch/pipermail/r-devel/2015-May/071104.html <https://stat.ethz.ch/pipermail/r-devel/2015-May/071104.html> (adding library(crlmm) to the above gets us to 53 DLLs).
>     >>
>     >>>
>     >>>> The real solution of course would be a code improvement that
>     >>>> starts with a relatively small number of "DLLinfo" structures
>     >>>> (say 32), and then allocates more batches (of size say 32) if
>     >>>> needed.
>     >>>
>     >>> The problem of course is that such code will rarely be exercised, and
>     >>> people have made errors on the boundaries (here multiples of 32) many
>     >>> times in the past.  (Note too that DLLs can be removed as well as added,
>     >>> another point of coding errors.)
>     >>
>     >> That argues for a simple increase in the maximum number of DLLs. This would enable some people to have very bulky applications that pay a performance cost (but the cost here is in small fractions of a second...) in terms of symbol look-up (and collision?), but would have no consequence for those of us with more sane use cases.
>
> I'm seconding Martin Morgan' argument.  We could go up to 200.
> Computer memory has been increasing a lot, since we set the
> limit to 100, and the symbol search performance indeed whould
> only be affected for those use cases with (too) many DLLs.
>
> Martin Maechler
>
>     >> Martin Morgan
>     >>
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list