[R] rbind wastes memory

Duncan Murdoch murdoch at stats.uwo.ca
Mon May 30 16:08:01 CEST 2005


lutz.thieme at amd.com wrote:
> Hello everybody,
> 
> if I try to (r)bind a number of large dataframes I run out of memory because R
> wastes memory and seems to "forget" to release memory. 
> 
> For example I have 10 files. Each file contains a large dataframe "ds" (3500 cols 
> by 800 rows) which needs ~20 MB RAM if it is loaded as the only object.
> Now I try to bind all data frames to a large one and need more than 1165MB (!)
> RAM (To simplify the R code, I use the same file ten times):
> 
> ________ start example 1 __________
> load(myFile)					
> ds.tmp	<- ds					
> for (Cycle in 1:10) {
> 	ds.tmp	<- rbind(ds.tmp, ds)
> }
> ________ end example 1 __________
> 
> 
> 
> Stepping into details I found the following (comment shows RAM usage after this line 
> was executed):
> load(myFile)			# 40MB (19MB for R itself)
> ds.tmp	<- ds			# 40MB; => only a pointer seems to be copied
> x<-rbind(ds.tmp, ds)		# 198MB
> x<-rbind(ds.tmp, ds)		# 233MB; the same instruction a second time leads to  
> 				# 35MB more RAM usage - why?

I'm guessing your problem is fragmented memory.  You are creating big 
objects, then making them bigger.  This means R needs to go looking for 
large allocations for the replacements, but they won't fit in the spots 
left by the things you've deleted, so those are being left empty.

A solution to this is to use two passes:  first figure out how much 
space you need, then allocate it and fill it.  E.g.

for (Cycle in 1:10) {
     rows[Cycle] <- .... some calculation based on the data ...
}

ds.tmp <- data.frame(x=double(sum(rows)), y=double(sum(rows)), ...

for (Cycle in 1:10) {
     ds.tmp[ appropriate rows, ] <- new data
}


Duncan Murdoch




More information about the R-help mailing list