[R] function pointers?

Paul Johnson pauljohn32 at gmail.com
Wed Nov 22 17:29:58 CET 2017


We have a project that calls for the creation of a list of many
distribution objects.  Distributions can be of various types, with
various parameters, but we ran into some problems. I started testing
on a simple list of rnorm-based objects.

I was a little surprised at the RAM storage requirements, here's an example:

N <- 10000
closureList <- vector("list", N)
nsize = sample(x = 1:100, size = N, replace = TRUE)
for (i in seq_along(nsize)){
    closureList[[i]] <- list(func = rnorm, n = nsize[i])
}
format(object.size(closureList), units = "Mb")

Output says
22.4 MB

I noticed that if I do not name the objects in the list, then the
storage drops to 19.9 MB.

That seemed like a lot of storage for a function's name. Why so much?
My colleagues think the RAM use is high because this is a closure
(hence closureList).  I can't even convince myself it actually is a
closure. The R source has

rnorm <- function(n, mean=0, sd=1) .Call(C_rnorm, n, mean, sd)

The storage holding 10000 copies of rnorm, but we really only need 1,
which we can use in the objects.

Thinking of this like C,  I am looking to pass in a pointer to the
function.  I found my way to the idea of putting a function in an
environment in order to pass it by reference:

rnormPointer <- function(inputValue1, inputValue2){
    object <- new.env(parent=globalenv())
    object$distr <- inputValue1
    object$n <- inputValue2
    class(object) <- 'pointer'
    object
}

## Experiment with that
gg <- rnormPointer(rnorm, 33)
gg$distr(gg$n)

ptrList <- vector("list", N)
for(i in seq_along(nsize)) {
    ptrList[[i]] <- rnormPointer(rnorm, nsize[i])
}
format(object.size(ptrList), units = "Mb")

The required storage is reduced to 2.6 Mb. Thats 1/10 of the RAM
required for closureList.  This thing works in the way I expect

## can pass in the unnamed arguments for n, mean and sd here
ptrList[[1]]$distr(33, 100, 10)
## Or the named arguments
ptrList[[1]]$distr(1, sd = 100)

This environment trick mostly works, so far as I can see, but I have
these questions.

1. Is the object.size() return accurate for ptrList?  Do I really
reduce storage to that amount, or is the required storage someplace
else (in the new environment) that is not included in object.size()?

2. Am I running with scissors here? Unexpected bad things await?

3. Why is the storage for closureList so great? It looks to me like
rnorm is just this little thing:

function (n, mean = 0, sd = 1)
.Call(C_rnorm, n, mean, sd)
<bytecode: 0x55cc9988cae0>

4. Could I learn (you show me?) to store the bytecode address as a
thing and use it in the objects?  I'd guess that is the fastest
possible way. In an Objective-C problem in the olden days, we found
the method-lookup was a major slowdown and one of the programmers
showed us how to save the lookup and use it over and over.

pj



-- 
Paul E. Johnson   http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

To write to me directly, please address me at pauljohn at ku.edu.



More information about the R-help mailing list