[Rd] Need for garbage collection after creating object

Henrik Bengtsson hb at stat.berkeley.edu
Tue Feb 5 19:45:34 CET 2008


On Feb 5, 2008 10:12 AM, Henrik Bengtsson <hb at stat.berkeley.edu> wrote:
> On Feb 5, 2008 8:01 AM, Iago Mosqueira <iago.mosqueira at gmail.com> wrote:
> > Hello,
> >
> > After experiencing some difficulties with large arrays, I was surprised
> > to see the apparent need for class to gc() after creating fairly large
> > arrays. For example, calling
> >
> > a<-array(2, dim=c(10,10,10,10,10,100))
> >
> > makes the memory usage of a fresh session of R jump from 13.8 Mb to
> > 166.4 Mb. A call to gc() brought it down to 90.8 Mb,
> >
> >  > gc()
> >             used (Mb) gc trigger  (Mb) max used  (Mb)
> > Ncells   132619  3.6     350000   9.4   350000   9.4
> > Vcells 10086440 77.0   21335887 162.8 20086792 153.3
> >
> > as expected by
> >
> >  > object.size(a)
> >
> > [1] 80000136
>
> I think the reason for this is that array() has to "expand" the input
> data to the right length internally;
>
>  data <- rep(data, length.out = vl)
>
> That is a so called "NAMED" object internally and when the following call to
>
>   dim(data) <- dim
>
> occurs, the safest thing R can do is to create a copy. [Anyone,
> correct me if I'm wrong].
>
> If you expand the input data yourself, you won't see that extra copy, e.g.
>
>   data <- 2
>   dim <- c(10,10,10,10,10,100)
>   data <- rep(data, length.out=prod(dim))
>   a <- array(data, dim=dim)

My bad here; that does indeed create an extra copy; rep() is the
problem and you see that when you gc() after rep().  It seems to be
hard to allocate an array with values without creating an extra copy,
e.g.

dim <- c(10,10,10,10,10,100)
data <- numeric(prod(dim))
dim(data) <- dim

will not create an extra copy, but as soon as you try to set a value
it will happen, e.g.

data[1,2,3,4,5,6] <- 2

Again, I believe this has to do with the fact that R is taking the
safest path possible and not risking overwriting an existing object in
memory (R is copy by value).  Note that when you do a second
assignment, that "safety copy" is already created so no more copies
will be created, e.g. calling

data[1,2,3,4,5,7] <- 3

after the above will not create an extra copy.

/Henrik

>
> >
> > Do I need to call gc() after creating every large array, or can I setup
> > the system to do this more often or efficiently?
>
> The R garbage collector will free/deallocate that memory when
> "needed".  However, calling gc() explicitly should minimize the risk
> for over-fragmented memory.  Basically, if there are several blocks of
> garbage memory hanging around, you might end up with a situation where
> you a lot of *total* memory available, but you will only be able to
> allocate small chunks of memory at any time.  Even calling gc() at
> that situation will not help; there is no mechanism that defragments
> memory in R.  So calling gc() after large allocations will add some
> protection against that.
>
> /Henrik
>
>
> >
> > Thanks very much,
> >
> >
> > Iago
> >
> >
> > $platform
> > [1] "i686-pc-linux-gnu"
> > $version.string
> > [1] "R version 2.6.1 (2007-11-26)"
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>



More information about the R-devel mailing list