[R] Why does loading saved/cached objects add significantly to RAM consumption?

Janko Thyson janko.thyson.rstuff at googlemail.com
Thu Sep 1 11:59:13 CEST 2011


On 30.08.2011 20:33, Henrik Bengtsson wrote:
> Hi.
>
> On Tue, Aug 30, 2011 at 3:59 AM, Janko Thyson
> <janko.thyson.rstuff at googlemail.com>  wrote:
>> Dear list,
>>
>> I make use of cached objects extensively for time consuming computations and
>> yesterday I happened to notice some very strange behavior in that respect:
>> When I execute a given computation whose result I'd like to cache (tried
>> both saving it as '.Rdata' and via package 'R.cache' which uses a own
>> filetype '.Rcache'),
> Just to clarify, it is just the filename extension that is "custom";
> it uses base::save() internally.  It is very unlikely that R.cache has
> to do with your problem.

Okay, got it.

>
>> my R session consumes about 200 MB of RAM, which is
>> fine. Now, when I make use of the previously cached object (i.e. loading it,
>> assigning it to a certain field of a Reference Class object), I noticed that
>> RAM consumption of my R process jumps to about 250 MB!
>> a
>> Each new loading of cached/saved objects adds to that consumption (in total,
>> I have about 5-8 objects that are processed this way), so at some point I
>> easily get a RAM consumption of over 2 GB where I'm only at about 200 MB of
>> consumption when I compute each object directly! Object sizes (checked with
>> 'object.size()') remain fairly constant. What's even stranger: after loading
>> cached objects and removing them (either via 'rm()' or by assigning a
>> 'fresh' empty object to the respective Reference Class field, RAM
>> consumption remains at this high level and never comes down again.
>>
>> I checked the behavior also in a small example which is a simplification of
>> my use case and which you'll find below (checked both on Win XP and Win 7 32
>> bit). I couldn't quite reproduce an immediate increase in RAM consumption,
> I couldn't reproduce it either using sessionInfo():
>
> R version 2.13.1 Patched (2011-08-29 r56823)
> Platform: x86_64-pc-mingw32/x64 (64-bit)
> locale:
> [1] LC_COLLATE=English_United States.1252
> [2] LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> loaded via a namespace (and not attached):
> [1] tools_2.13.1

I'll try to come up with an example that resembles more of my actual use 
case.

>> but what I still find really strange is
>> a) why do repeated 'load()' calls result in an increase in RAM consumption?
>> b) why does the latter not go down again after the objects have been removed
>> from '.GlobalEnv'?

Thanks for the hint to an explicit call to 'gc()'. That brings down 
memorey usage and would work if I wouldn't need the "content" of the 
objects I load and could therefore remove them ('rm(x)'; 'gc()'), but 
that's exactly what I need: load data and assign it to some environments.

> Removed objects may still sit in memory - it is only when R's garbage
> collector (GC) comes around and removes them that the memory usage
> goes down.  You can force the garbage collector to run by calling
> gc(), but normally it is automatically triggered whenever needed.
>
> Note that the GC will only be able to clean up the memory of removed
> objects IFF there are no other references to that object/piece of
> memory.  When you use References classes (cf. setRefClass()) and
> environments, you end up keeping references internally in objects
> without being aware of it.  My guess is that your other code may have
> such issues, whereas the code below does not.
>
> There is also the concept of "promises" [see 'R Language Definition'
> document], which *may* also be involved.
>
> FYI, the Sysinternals Process Explorer
> [http://technet.microsoft.com/en-us/sysinternals/bb896653] is a useful
> tool for studying individual processes such as R.

Thanks for that one as well! I'll have a more detailed look into this.

Best regards,
Janko

> My $.02
>
> Henrik
>
>> Did anyone of you experience a similar behavior? Or even better, does anyone
>> know why this is happening and how it might be fixed (or be worked around)?
>> ;-)
>>
>> I really need your help on this one as it's crucial for my thesis, thanks a
>> lot for anyone replying!!
>>
>> Regards,
>> Janko
>>
>> ##### EXAMPLE #####
>>
>> setRefClass("A", fields=list(.PRIMARY="environment"))
>> setRefClass("Test", fields=list(a="A"))
>>
>> obj.1<- lapply(1:5000, function(x){
>>     rnorm(x)
>> })
>> names(obj.1)<- paste("sample", 1:5000, sep=".")
>> obj.1<- as.environment(obj.1)
>>
>> test<- new("Test", a=new("A", .PRIMARY=obj.1))
>> test$a$.PRIMARY$sample.10
>>
>> #+++++
>>
>> object.size(test)
>> object.size(test$a)
>> object.size(obj.1)
>> # RAM used by R session: 118 MB
>>
>> save(obj.1, file="C:/obj.1.Rdata")
>> # Results in an object of ca. 94 MB
>> save(test, file="C:/test.Rdata")
>> # Results in an object of ca. 94 MB
>>
>> ##### START A NEW R SESSION #####
>>
>> load("C:/test.Rdata")
>> # RAM consumption still fine at 115 - 118 MB
>>
>> # But watch how it goes up as we repeatedly load objects
>> for(x in 1:5){
>>     load("C:/test.Rdata")
>> }
>> for(x in 1:5){
>>     load("C:/obj.1.Rdata")
>> }
>> # Somehow there seems to be an upper limit, though
>>
>> # Removing the objects does not bring down RAM consumption
>> rm(obj.1)
>> rm(test)
>>
>> ##########
>>
>>> Sys.info()
>>                      sysname                      release
>>                    "Windows"                         "XP"
>>                      version                     nodename
>> "build 2600, Service Pack 3"               "ASHB-109C-02"
>>                      machine                        login
>>                        "x86"                     "wwa418"
>>                         user
>>                     "wwa418"
>>
>>> sessionInfo()
>> R version 2.13.1 (2011-07-08)
>> Platform: i386-pc-mingw32/i386 (32-bit)
>>
>> locale:
>> [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252
>> [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
>> [5] LC_TIME=German_Germany.1252
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> loaded via a namespace (and not attached):
>> [1] codetools_0.2-8 tools_2.13.1
>>
>>



More information about the R-help mailing list