[R] small object but huge RData file exported

Henrik Bengtsson henr|k@bengt@@on @end|ng |rom gm@||@com
Wed Oct 20 22:06:06 CEST 2021


Example illustrating what Duncan says:

> make_formula <- function() { large <- rnorm(1e6); x ~ y }
> formula <- make_formula()

# "Apparent" size of object
> object.size(formula)
728 bytes

# Actual serialization size
> length(serialize(formula, connection = NULL))
[1] 8000203

# A better size estimate
> lobstr::obj_size(formula)
8,000,888 B

/Henrik

On Wed, Oct 20, 2021 at 12:57 PM Duncan Murdoch
<murdoch.duncan using gmail.com> wrote:
>
> On 20/10/2021 9:20 a.m., Jinsong Zhao wrote:
> > On 2021/10/20 21:05, Duncan Murdoch wrote:
> >> On 20/10/2021 8:57 a.m., Jinsong Zhao wrote:
> >>> Hi there,
> >>>
> >>> I have a RData file that is obtained by save.image() with size about
> >>> 74.0 MB (77,608,222 bytes).
> >>>
> >>> When load into R, I measured the size of each object with object.size():
> >>>
> >>>> object.size(combn.rda.m)
> >>> 105448 bytes
> >>>> object.size(cross)
> >>> 102064 bytes
> >>>> object.size(denitr.1)
> >>> 25032 bytes
> >>>> object.size(rda.denitr.1)
> >>> 600280 bytes
> >>>> object.size(xh)
> >>> 7792 bytes
> >>>> object.size(xh.x)
> >>> 6064 bytes
> >>>> object.size(xh.x.1)
> >>> 24144 bytes
> >>>> object.size(xh.x.2)
> >>> 24144 bytes
> >>>> object.size(xh.x.3)
> >>> 24144 bytes
> >>>> object.size(xh.y)
> >>> 2384 bytes
> >>>
> >>> There are all small objects.
> >>>
> >>> If I delete the largest one "rda.denitr.1", and save.image("xx.RData").
> >>> It has the size of 22.6 KB (23,244 bytes). All seem OK.
> >>>
> >>> However, when I save(rda.denitr.1, file = "yy.RData"), then it has the
> >>> size of 73.9 MB (77,574,869 bytes).
> >>>
> >>> I don't know why...
> >>>
> >>> Any hint?
> >>
> >> As the docs for object.size() say, "Exactly which parts of the memory
> >> allocation should be attributed to which object is not clear-cut."  In
> >> particular, if a function or formula has an associated environment, it
> >> isn't included, but it is sometimes saved in the image.
> >>
> >> So I'd suspect rda.denitr.1 contains something that references an
> >> environment, and it's an environment that would be saved.  (I forget the
> >> exact rules, but I think that means it's not the global environment and
> >> it's not a package environment.)
> >>
> >> Duncan Murdoch
> >
> >
> > The rda.denitr.1 is only a list with length 2:
> > rda.denitr.1[[1]] is a vector with length 10;
> > rda.denitr.2[[2]] is a list with the length 10. rda.denitr.1[[2]][[1]]
> > to rda.denitr.1[[2]][[10]] are small RDA objects generated by rda() from
> > vegan package.
> >
> > If I
> >   > a <- rda.denitr.1[[2]][[1]]
> >   > object.size(a)
> > 59896 bytes
> >   > save(a, file = "abc.RData")
> > It also has a large size of 73.9 MB (77,536,611 bytes)
> >
> > Jinsong
> >
>
> The rda() function uses formulas.  If it saves the formula in the
> result, then it references the environment of that formula, typically
> the environment where the formula was created.
>
> Duncan Murdoch
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list