[Rd] how to control the environment of a formula

Thomas Alexander Gerds tag at biostat.ku.dk
Sun Apr 21 08:18:29 CEST 2013


thanks. yes, I was considering to use as.character(f) but your solution
2 is much better -- did not know ' was a R function as well. just
checked: model.frame does not get confused and this will be used to
evaluate formula by all functions in my packages.

however, there could be related problems with memory. I noticed that
some of my processes use unexpectedly much memory. how can one trace
this?

I am not desperate to save diskspace: the problem is that file transfer
and sharing (like dropbox) suffer when each simulation results fills 8M
instead of 130K just because a large data set is invisibly sitting in
the saved file.

Duncan Murdoch <murdoch.duncan at gmail.com> writes:

> On 13-04-19 2:57 PM, Thomas Alexander Gerds wrote:
>> hmm. I have tested a bit more, and found this perhaps more difficult
>> solve situation. even though I delete x, since x is part of the
>> output of the formula, the size of the object is twice as much as it
>> should be:
>> test <- function(x){ x <- rnorm(1000000) out <- list(x=x) rm(x)
>> out$f <- as.formula(a~b) out } v <- test(1) x <- rnorm(1000000)
>> save(v,file="~/tmp/v.rda") save(x,file="~/tmp/x.rda") system("ls
>> -lah ~/tmp/*.rda")
>> -rw-rw-r-- 1 tag tag 15M Apr 19 20:52 /home/tag/tmp/v.rda -rw-rw-r--
>> 1 tag tag 7,4M Apr 19 20:52 /home/tag/tmp/x.rda
>> can you solve this as well?
>
> Yes, this is tricky.  The problem is that "out" is in the environment
> of out$f, so you get two copies when you save it.  (I think you won't
> have two copies in memory, because R only makes a copy when it needs
> to, but I haven't traced this.)
>
> Here are two solutions, both have some problems.
>
> 1.  Don't put out in the environment:
>
> test <- function(x) { x <- rnorm(1000000) out$x <- list(x=x) out$f <-
> a ~ b # the as.formula() was never needed # temporarily create a new
> environment local({ # get a copy of what you want to keep out <- out #
> remove everything that you don't need from the formula rm(list=c("x",
> "out"), envir=environment(out$f)) # return the local copy out }) }
>
> I don't like this because it is too tricky, but you could probably
> wrap the tricky bits into a little function (a variant on return()
> that cleans out the environment first), so it's probably what I would
> use if I was desperate to save space in saved copies.
>
> 2. Never evaluate the formula in the first place, so it doesn't pick
> up the environment:
>
> test <- function(x) { x <- rnorm(1000000) out$x <- list(x=x) out$f <-
> quote(a ~ b) out }
>
> This is a lot simpler, but it might not work with some modelling
> functions, which would be confused by receiving the model formula
> unevaluated.  It also has the problems that you get with using
> .GlobalEnv as the environment of the formula, but maybe to a slightly
> lesser extent: rather than having what is possibly the wrong
> environment, it doesn't have one at all.
>
> Duncan Murdoch
>
>> thanks!  thomas
>> Duncan Murdoch <murdoch.duncan at gmail.com> writes:
>>
>>> On 13-04-18 11:39 AM, Thomas Alexander Gerds wrote:
>>>> Dear Duncan thank you for taking the time to answer my questions!
>>>> It will be quite some work to delete all the objects generated
>>>> inside the function ... but if there is no other way to avoid a
>>>> large environment then this is what I will do.
>>> It's not really that hard.  Use names <- ls() in the function to
>>> get a list of all of them; remove the names of variables that might
>>> be needed in the formula (and the name of the formula itself); then
>>> use rm(list=names) to delete everything else just before returning
>>> it.
>>> Duncan Murdoch
>>>

-- 
Thomas A. Gerds -- Assoc. Prof. Department of Biostatistics Copenhagen
University of Copenhagen, Oester Farimagsgade 5, 1014 Copenhagen, Denmark



More information about the R-devel mailing list