[Rd] Model object, when generated in a function, saves entire environment when saved

Kenny Bell kmbell56 at gmail.com
Wed Jul 27 21:31:52 CEST 2016


Thanks so much for all this.

The first solution is what I'm going with as I want the terms object to
come along so that predict still works.

On Wed, Jul 27, 2016 at 12:28 PM, William Dunlap via R-devel <
r-devel at r-project.org> wrote:

> Another solution is to only save the parts of the model object that
> interest you.  As long as they don't include the formula (which is
> what drags along the environment it was created in), you will
> save space.  E.g.,
>
> tfun2 <- function(subset) {
>    junk <- 1:1e6
>    list(subset=subset, lm(Sepal.Length ~ Sepal.Width, data=iris,
> subset=subset)$coef)
> }
>
> saveSize(tfun2(1:4))
> #[1] 152
>
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Wed, Jul 27, 2016 at 11:19 AM, William Dunlap <wdunlap at tibco.com>
> wrote:
>
> > One way around this problem is to make a new environment whose
> > parent environment is .GlobalEnv and which contains only what the
> > the call to lm() requires and to compute lm() in that environment.
>  E.g.,
> >
> > tfun1 <- function (subset)
> > {
> >     junk <- 1:1e+06
> >     env <- new.env(parent = globalenv())
> >     env$subset <- subset
> >     with(env, lm(Sepal.Length ~ Sepal.Width, data = iris, subset =
> subset))
> > }
> > Then we get
> >    > saveSize(tfun1(1:4)) # see below for def. of saveSize
> >    [1] 910
> > instead of the 2129743 bytes in the save file when using the naive
> method.
> >
> > saveSize <- function (object) {
> >     tf <- tempfile(fileext = ".RData")
> >     on.exit(unlink(tf))
> >     save(object, file = tf)
> >     file.size(tf)
> > }
> >
> >
> >
> > Bill Dunlap
> > TIBCO Software
> > wdunlap tibco.com
> >
> > On Wed, Jul 27, 2016 at 10:48 AM, Kenny Bell <kmb56 at berkeley.edu> wrote:
> >
> >> In the below, I generate a model from an environment that isn't
> >> .GlobalEnv with a large object that is unrelated to the model
> >> generation. It seems to save the irrelevant object unnecessarily. In
> >> my actual use case, I am running and saving many models in a loop that
> >> each use a single large data.frame (that gets collapsed into a small
> >> data.frame for estimation), so removing it isn't an option.
> >>
> >> In the case where the model exists in .GlobalEnv, everything is
> >> peachy. So replicating whatever happens when saving the model that was
> >> generated in .GlobalEnv at the return() stage of the function call
> >> would fix this problem.
> >>
> >> I was referred to this list from r-bugs. First time r-devel poster.
> >>
> >> Hope this helps,
> >>
> >> Kendon
> >>
> >> ```
> >> tmp_fun <- function(x){
> >>   iris_big <- lapply(1:10000, function(x) iris)
> >>   lm(Sepal.Length ~ Sepal.Width, data = iris)
> >> }
> >>
> >> out <- tmp_fun(1)
> >> object.size(out)
> >> # 48008
> >> save(out, file = "tmp.RData", compress = FALSE)
> >> file.size("tmp.RData")
> >> # 57196752 - way too big
> >>
> >> # Works fine when in .GlobalEnv
> >> iris_big <- lapply(1:10000, function(x) iris)
> >> out <- lm(Sepal.Length ~ Sepal.Width, data = iris)
> >>
> >> object.size(out)
> >> # 48008
> >> save(out, file = "tmp.RData", compress = FALSE)
> >> file.size("tmp.RData")
> >> # 16641 - good size.
> >> ```
> >>
> >>         [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> R-devel at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
> >
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list