[Rd] Model object, when generated in a function, saves entire environment when saved

Duncan Murdoch murdoch.duncan at gmail.com
Wed Jul 27 21:11:53 CEST 2016


On 27/07/2016 1:48 PM, Kenny Bell wrote:
> In the below, I generate a model from an environment that isn't
> .GlobalEnv with a large object that is unrelated to the model
> generation. It seems to save the irrelevant object unnecessarily. In
> my actual use case, I am running and saving many models in a loop that
> each use a single large data.frame (that gets collapsed into a small
> data.frame for estimation), so removing it isn't an option.

If each of those many models refers to the object in the formula, then 
you need to keep it.  But you'll only have one copy of it, because 
environments are reference objects in R.

If your loop looks like this,

for (i in 1:n) {
   subset <- bigdf[ fn(i), ]
   model[i] <- lm(y ~ x, data = subset)
}

then you might be in trouble.  You'll only get one copy of the "subset" 
variable in the environment, so in any cases where code gets it from 
there, they'll get the last one, not the one for model[i].

One way around this is to write a nested function to create the subset 
variable, e.g.

  nested <- function(subset) {
    lm(y ~ x, data = subset)
  }
  for (i in 1:n)
    model[i] <- nested(bigdf[ fn(i), ])
  rm(bigdf)

and it will be safe to remove bigdf after the loop.  (I see that Bill 
Dunlap has posted a different way of achieving the same sort of thing.)

Duncan Murdoch

>
> In the case where the model exists in .GlobalEnv, everything is
> peachy. So replicating whatever happens when saving the model that was
> generated in .GlobalEnv at the return() stage of the function call
> would fix this problem.
>
> I was referred to this list from r-bugs. First time r-devel poster.
>
> Hope this helps,
>
> Kendon
>
> ```
> tmp_fun <- function(x){
>   iris_big <- lapply(1:10000, function(x) iris)
>   lm(Sepal.Length ~ Sepal.Width, data = iris)
> }
>
> out <- tmp_fun(1)
> object.size(out)
> # 48008
> save(out, file = "tmp.RData", compress = FALSE)
> file.size("tmp.RData")
> # 57196752 - way too big
>
> # Works fine when in .GlobalEnv
> iris_big <- lapply(1:10000, function(x) iris)
> out <- lm(Sepal.Length ~ Sepal.Width, data = iris)
>
> object.size(out)
> # 48008
> save(out, file = "tmp.RData", compress = FALSE)
> file.size("tmp.RData")
> # 16641 - good size.
> ```
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



More information about the R-devel mailing list