[Rd] Model object, when generated in a function, saves entire environment when saved

Harvey Smith h@rvey13131 @end|ng |rom gm@||@com
Thu Jan 30 22:53:47 CET 2020


Depending on if you need the data in the referenced environments later, you
could fit the model normally and use the refhook argument in
saveRDS/readRDS to replace references to environments in the model with a
dummy value.

normal_lm <- function(){

  junk <- runif(1e+08)

  lm(Sepal.Length ~ Sepal.Width, data = iris)

}

object = normal_lm()

tf <- tempfile(fileext = ".rds")

saveRDS(object, file = tf, refhook = function(...) {""})

object2 = readRDS(file = tf, refhook = function(...) { .GlobalEnv })

file.size(tf)






On Wed, Jan 29, 2020 at 3:24 PM Duncan Murdoch <murdoch.duncan using gmail.com>
wrote:

> On 29/01/2020 2:25 p.m., Kenny Bell wrote:
> > Reviving an old thread. I haven't noticed this be a problem for a while
> > when saving RDS's which is great. However, I noticed the problem again
> when
> > saving `qs` files (https://github.com/traversc/qs) which is an RDS
> > replacement with a fast serialization / compression system.
> >
> > I'd like to get an idea of what change was made within R to address this
> > issue for `saveRDS`. My thought is that this will help the author of the
> > `qs` package do something similar. I have had a browse through the
> release
> > notes for the last few years (Ctrl-F-ing "environment") and couldn't see
> it.
>
> The vector 1:1e+08 is stored very compactly in recent R versions (the
> start and end plus a marker that it's a sequence), and it appears
> saveRDS takes advantage of that while qs::qsave doesn't.  That's not a
> very useful test, because environments typically aren't filled with long
> sequence vectors.  If you replace the line
>
>    junk <- 1:1e+08
>
> with
>
>    junk <- runif(1e+08)
>
> you'll see drastically different results:
>
>  > save_size_qs(normal_lm())
> [1] 417953609
>  > #> [1] 848396
>  > save_size_rds(normal_lm())
> [1] 532614827
>  > #> [1] 4163
>  > save_size_qs(normal_ggplot())
> [1] 417967987
>
>  > #> [1] 857446
>  > save_size_rds(normal_ggplot())
> [1] 532624477
>  > #> [1] 12895
>
> Duncan Murdoch
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list