[Rd] proper use of reg.finalizer to close connections

Henrik Bengtsson hb at biostat.ucsf.edu
Mon Oct 27 18:27:41 CET 2014


...and don't forget to make sure all the function that .myFinalizer()
calls are also around. /Henrik

On Mon, Oct 27, 2014 at 10:10 AM, Murat Tasan <mmuurr at gmail.com> wrote:
> Eh, after some flailing, I think I solved it.
> I _think_ this pattern should guarantee that the finalizer function is
> still present when needed:
>
> .STATE_CONTAINER <- new.env(parent = emptyenv())
> .STATE_CONTAINER$some_state_variable <- ## some code
> .STATE_CONTAINER$some_other_state_variable <- ## some code
>
> .myFinalizer <- function(name_of_state_variable_to_clean_up)
>
> .onLoad <- function(libname, pkgname) {
>     reg.finalizer(
>         e = parent.env(environment()),
>         f = function(env) sapply(ls(env$.STATE_CONTAINER), .myFinalizer),
>         onexit = TRUE)
> }
>
> This way, the finalizer is registered on the enclosing environment of
> the .onLoad function, which should be the package environment itself.
> And that means .myFinalizer should still be around when it's called
> during q() or unload/gc().
> Effectively, the finalizer is tied to the entire package, rather than
> the state variable container(s), which might not be the most elegant
> solution, but it should work well enough for most purposes.
>
> Cheers and thanks for the advice,
>
> -m
>
> On Mon, Oct 27, 2014 at 12:18 AM, Murat Tasan <mmuurr at gmail.com> wrote:
>> Ah, good point, I hadn't thought of that detail.
>> Would moving reg.finalizer back outside of .onLoad and hooking it to the
>> package's environment itself work (more safely)?
>> Something like:
>> finalizerFunction <- ## cleanup code
>> reg.finalizer(parent.env(), finalizerFunction)
>>
>> -m
>>
>> On Oct 26, 2014 11:03 PM, "Henrik Bengtsson" <hb at biostat.ucsf.edu> wrote:
>>>
>>> On Sun, Oct 26, 2014 at 8:14 PM, Murat Tasan <mmuurr at gmail.com> wrote:
>>> > Ah (again)!
>>> > Even with my fumbling presentation of the issue, you gave me the hint
>>> > that solved it, thanks!
>>> >
>>> > Yes, the reg.finalizer call needs to be wrapped in an .onLoad hook so
>>> > it's not called once during package installation and then never again.
>>> > And once I switched to using ls() (instead of names()), everything
>>> > works as expected.
>>> >
>>> > So, the package code effectively looks like so:
>>> >
>>> > .CONNS <- new.env(parent = emptyenv())
>>> > .onLoad <- function(libname, pkgname) {
>>> >     reg.finalizer(.CONNS, function(x) sapply(ls(x), .disconnect))
>>> > }
>>> > .disconnect <- function(x) {
>>> >     ## handle disconnection of .CONNS[[x]] here
>>> > }
>>>
>>> In your example above, I would be concerned about what happens if you
>>> detach/unload your package, because then you're finalizer is still
>>> registered and will be called whenever '.CONNS' is being garbage
>>> collector (or there after).  However, the finalizer function calls
>>> .disconnect(), which is no longer available.
>>>
>>> Finalizers should be used with great care, because you're not in
>>> control in what order things are occurring and what "resources" are
>>> around when the finalizer function is eventually called and when it is
>>> called.  I've been bitten by this a few times and it can be very hard
>>> to reproduce and troubleshoot such bugs.  See also the 'Note' of
>>> ?reg.finalizer.
>>>
>>> My $.02
>>>
>>> /Henrik
>>>
>>> >
>>> > Cheers and thanks!
>>> >
>>> > -m
>>> >
>>> >
>>> >
>>> >
>>> > On Sun, Oct 26, 2014 at 8:53 PM, Gábor Csárdi <csardi.gabor at gmail.com>
>>> > wrote:
>>> >> Well, to be honest I don't understand fully what you are trying to do.
>>> >> If you want to run code when the package is detached or when it is
>>> >> unloaded, then use a hook:
>>> >> http://cran.r-project.org/doc/manuals/r-devel/R-exts.html#Load-hooks
>>> >>
>>> >> If you want to run code when an object is freed, then use a finalizer.
>>> >>
>>> >> Note that when you install a package, R runs all the code in the
>>> >> package and only stores the results of the code in the installed
>>> >> package. So if you create an object outside of a function in your
>>> >> package, then only the object will be stored in the package, but not
>>> >> the code that creates it. The object will be simply loaded when you
>>> >> load the package, but it will not be re-created.
>>> >>
>>> >> Now, I am not sure what happens if you set the finalizer on such an
>>> >> object in the package. I can imagine that the finalizer will not be
>>> >> saved into the package, and is only used once, when
>>> >> building/installing the package. In this case you'll need to set the
>>> >> finalizer in .onLoad().
>>> >>
>>> >> Gabor
>>> >>
>>> >> On Sun, Oct 26, 2014 at 10:35 PM, Murat Tasan <mmuurr at gmail.com> wrote:
>>> >>> Ah, thanks for the ls() vs names() tip!
>>> >>> (But sadly, it didn't solve the issue... )
>>> >>>
>>> >>> So, after some more tinkering, I believe the finalizer is being called
>>> >>> _sometimes_.
>>> >>> I changed the reg.finalizer(...) call to just this:
>>> >>>
>>> >>> reg.finalizer(.CONNS, function(x) print("foo"), onexit  = TRUE)
>>> >>>
>>> >>> Now, when I load the package and detach(..., unload = TRUE), nothing
>>> >>> prints.
>>> >>> And when I quit, nothing prints.
>>> >>>
>>> >>> If I, however, create an environment on the workspace, like so:
>>> >>>> e <- new.env(parent = emptyenv())
>>> >>>> reg.finalizer(e, function(x) print("bar"), onexit = TRUE)
>>> >>> When I quit (or rm(e)), "bar" is printed.
>>> >>> But no "foo" (corresponding to same sequence of code, just in the
>>> >>> package instead).
>>> >>>
>>> >>> BUT(!), when I _install_ the package, "foo" is printed at the end of
>>> >>> the "**testing if installed package can be loaded" installation
>>> >>> segment.
>>> >>> So, somehow the R script that tests for package loading/unloading is
>>> >>> triggering the finalizer (which is good).
>>> >>> Yet, I cannot seem to trigger it myself when either quitting or
>>> >>> forcing a package unload (which is bad).
>>> >>>
>>> >>> Any ideas why the installation script would successfully trigger a
>>> >>> finalizer while standard unloading or quitting wouldn't?
>>> >>>
>>> >>> Cheers and thanks!
>>> >>>
>>> >>> -m
>>> >>>
>>> >>> On Sun, Oct 26, 2014 at 8:03 PM, Gábor Csárdi <csardi.gabor at gmail.com>
>>> >>> wrote:
>>> >>>> Hmmm, I guess you will want to put the actual objects that represent
>>> >>>> the connections into the environment, at least this seems to be the
>>> >>>> easiest to me. Btw. you need ls() to list the contents of an
>>> >>>> environment, instead of names(). E.g.
>>> >>>>
>>> >>>> e <- new.env()
>>> >>>> e$foo <- 10
>>> >>>> e$bar <- "aaa"
>>> >>>> names(e)
>>> >>>> #> NULL
>>> >>>> ls(e)
>>> >>>> #> [1] "bar" "foo"
>>> >>>> reg.finalizer(e, function(x) { print(ls(x)) })
>>> >>>> #> NULL
>>> >>>> rm(e)
>>> >>>> gc()
>>> >>>> #> [1] "bar" "foo"
>>> >>>> #>           used (Mb) gc trigger  (Mb) max used  (Mb)
>>> >>>> #> Ncells 1528877 81.7    2564037 137.0  2564037 137.0
>>> >>>> #> Vcells 3752538 28.7    7930384  60.6  7930356  60.6
>>> >>>>
>>> >>>> More precisely, you probably want to represent each connection as a
>>> >>>> separate environment, with its own finalizer. Hope this helps,
>>> >>>> Gabor
>>> >>>>
>>> >>>> On Sun, Oct 26, 2014 at 9:49 PM, Murat Tasan <mmuurr at gmail.com>
>>> >>>> wrote:
>>> >>>>> Hi all, I have a question about finalizers...
>>> >>>>> I have a package that manages state for a few connections, and I'd
>>> >>>>> like to ensure that these connections are 'cleanly' closed upon
>>> >>>>> either
>>> >>>>> (i) R quitting or (ii) an unloading of the package.
>>> >>>>> So, in a pared-down example package with a single R file, it looks
>>> >>>>> something like:
>>> >>>>>
>>> >>>>> ##### BEGIN PACKAGE CODE #####
>>> >>>>> .CONNS <- new.env(parent = emptyenv())
>>> >>>>> .CONNS$resource1 <- NULL
>>> >>>>> .CONNS$resource2 <- NULL
>>> >>>>> ## some more .CONNS resources...
>>> >>>>>
>>> >>>>> reg.finalizer(.CONNS, function(x) sapply(names(x), disconnect),
>>> >>>>> onexit = TRUE)
>>> >>>>>
>>> >>>>> connect <- function(x) {
>>> >>>>>   ## here lies code to connect and update .CONNS[[x]]
>>> >>>>> }
>>> >>>>> disconnect <- function(x) {
>>> >>>>>   print(sprintf("disconnect(%s)", x))
>>> >>>>>   ## here lies code to disconnect and update .CONNS[[x]]
>>> >>>>> }
>>> >>>>> ##### END PACKAGE CODE #####
>>> >>>>>
>>> >>>>> The print(...) statement in disconnect(...) is there as a trace, as
>>> >>>>> I
>>> >>>>> hoped that I'd see disconnect(...) being called when I quit (or
>>> >>>>> detach(..., unload = TRUE)).
>>> >>>>> But, it doesn't appear that disconnect(...) is ever called when the
>>> >>>>> package (and .CONNS) falls out of memory/scope (and I ran gc() after
>>> >>>>> detach(...), just to be sure).
>>> >>>>>
>>> >>>>> In a second 'shot-in-the-dark' attempt, I placed the reg.finalizer
>>> >>>>> call inside an .onLoad function, but that didn't seem to work,
>>> >>>>> either.
>>> >>>>>
>>> >>>>> I'm guessing my use of reg.finalizer is way off-base here... but I
>>> >>>>> cannot infer from the reg.finalizer man page what I might be doing
>>> >>>>> wrong.
>>> >>>>> Is there a way to see, at the R-system level, what functions have
>>> >>>>> been
>>> >>>>> registered as finalizers?
>>> >>>>>
>>> >>>>> Thanks for any pointers!
>>> >>>>>
>>> >>>>> -Murat
>>> >>>>>
>>> >>>>> ______________________________________________
>>> >>>>> R-devel at r-project.org mailing list
>>> >>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> >>>
>>> >>> ______________________________________________
>>> >>> R-devel at r-project.org mailing list
>>> >>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> >
>>> > ______________________________________________
>>> > R-devel at r-project.org mailing list
>>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list