[R] rbind wastes memory

Douglas Bates bates at stat.wisc.edu
Mon May 30 16:03:04 CEST 2005


lutz.thieme at amd.com wrote:
> Hello everybody,
> 
> if I try to (r)bind a number of large dataframes I run out of memory because R
> wastes memory and seems to "forget" to release memory. 
> 
> For example I have 10 files. Each file contains a large dataframe "ds" (3500 cols 
> by 800 rows) which needs ~20 MB RAM if it is loaded as the only object.
> Now I try to bind all data frames to a large one and need more than 1165MB (!)
> RAM (To simplify the R code, I use the same file ten times):
> 
> ________ start example 1 __________
> load(myFile)					
> ds.tmp	<- ds					
> for (Cycle in 1:10) {
> 	ds.tmp	<- rbind(ds.tmp, ds)
> }
> ________ end example 1 __________
> 
> 
> 
> Stepping into details I found the following (comment shows RAM usage after this line 
> was executed):
> load(myFile)			# 40MB (19MB for R itself)
> ds.tmp	<- ds			# 40MB; => only a pointer seems to be copied
> x<-rbind(ds.tmp, ds)		# 198MB
> x<-rbind(ds.tmp, ds)		# 233MB; the same instruction a second time leads to  
> 				# 35MB more RAM usage - why?
> 
> 
> Now I played around, but I couldn't find a solution. For example I bound each dataframe 
> step by step and removed the variables and cleared memory, but I still need 1140MB(!) 
> RAM:
> 
> ________ start example 2 __________
> tmpFile<- paste(myFile,'.tmp',sep="")
> load(myFile)
> ds.tmp	 <- ds
> save(ds.tmp, file=tmpFile, compress=T)
> 
> for (Cycle in 1:10) {
> 	ds	<- NULL
> 	ds.tmp <- NULL
> 	rm(ds, ds.tmp)
> 	gc()
> 	load(tmpFile)
> 	load(myFile)
> 	ds.tmp	<- rbind(ds.tmp, ds)
> 	save(ds.tmp,file=tmpFile, compress=T)
> 	cat(Cycle,': ',object.size(ds),object.size(ds.tmp),'\n')
> }
> ________ end example 1 __________
> 
> 
> platform i386-pc-solaris2.8
> arch     i386              
> os       solaris2.8        
> system   i386, solaris2.8  
> status                     
> major    1                 
> minor    9.1               
> year     2004              
> month    06                
> day      21                
> language R       
> 
> 
> 
> 
> How can I avoid to run in that memory problem? Any ideas are very appreciated. 
> Thank you in advance & kind regards,

If you are going to look at the memory usage you should use gc(), and
perhaps repeated calls to gc(), before checking the memory footprint.
This will force a garbage collection.

Also, you will probably save memory by treating your data frames as
lists and concatenating them, then converting the result to a data frame.




More information about the R-help mailing list