[R] Memory filling up while looping

Peter Meißner peter.meissner at uni-konstanz.de
Fri Dec 21 18:41:22 CET 2012


Yeah, thanks,
I know: !DO NOT USE RBIND! !

But it does not help, although using a predefined list to store results 
as suggested there, it does not help.

The problems seems to stem from the XML-package and not from the way I 
store the data until saved.

Best, Peter



Am 21.12.2012 18:33, schrieb Patrick Burns:
> Circle 2 of 'The R Inferno' may help you.
>
> http://www.burns-stat.com/pages/Tutor/R_inferno.pdf
>
> In particular, it has an example of how to do what
> Duncan suggested.
>
> Pat
>
>
> On 21/12/2012 15:27, Peter Meißner wrote:
>> Here is an working example that reproduces the behavior by creating 1000
>> xml-files and afterwards parsing them.
>>
>> At my PC, R starts with about 90MB of RAM with every cycle another
>> 10-12MB are further added to the RAM-usage so I end up with 200MB RAM
>> usage.
>>
>> In the real code one chunk-cycle eats about 800MB of RAM which was one
>> of the reasons I decided to splitt up the process in seperate chunks in
>> the first place.
>>
>> ----------------
>> 'Minimal'Example - START
>> ----------------
>>
>> # the general problem
>> require(XML)
>>
>> chunk <- function(x, chunksize){
>>              # source: http://stackoverflow.com/a/3321659/1144966
>>              x2 <- seq_along(x)
>>              split(x, ceiling(x2/chunksize))
>>          }
>>
>>
>>
>> chunky <- chunk(paste("test",1:1000,".xml",sep=""),100)
>>
>> for(i in 1:1000){
>>      writeLines(c(paste('<?xml version="1.0"?>\n <note>\n
>> <to>Tove</to>\n    <nr>',i,'</nr>\n    <from>Jani</from>\n
>> <heading>Reminder</heading>\n    ',sep=""), paste(rep('<body>Do not
>> forget me this weekend!</body>\n',sample(1:10, 1)),sep="" ) , ' </note>')
>>      ,paste("test",i,".xml",sep=""))
>> }
>>
>> for(k in 1:length(chunky)){
>>      gc()
>>      print(chunky[[k]])
>>      xmlCatcher <- NULL
>>
>>      for(i in 1:length(chunky[[k]])){
>>          filename    <- chunky[[k]][i]
>>          xml         <- xmlTreeParse(filename)
>>          xml         <- xmlRoot(xml)
>>          result      <- sapply(getNodeSet(xml,"//body"), xmlValue)
>>          id          <- sapply(getNodeSet(xml,"//nr"), xmlValue)
>>          dummy       <- cbind(id,result)
>>          xmlCatcher  <- rbind(xmlCatcher,dummy)
>>          }
>>      save(xmlCatcher,file=paste("xmlCatcher",k,".RData"))
>> }
>>
>> ----------------
>> 'Minimal'Example - END
>> ----------------
>>
>>
>>
>> Am 21.12.2012 15:14, schrieb jim holtman:
>>> Can you send either your actual script or the console output so I can
>>> get an idea of how fast memory is growing.  Also at the end, can you
>>> list the sizes of the objects in the workspace.  Here is a function I
>>> use to get the space:
>>>
>>> my.ls <-
>>> function (pos = 1, sorted = FALSE, envir = as.environment(pos))
>>> {
>>>      .result <- sapply(ls(envir = envir, all.names = TRUE),
>>> function(..x) object.size(eval(as.symbol(..x),
>>>          envir = envir)))
>>>      if (length(.result) == 0)
>>>          return("No objects to list")
>>>      if (sorted) {
>>>          .result <- rev(sort(.result))
>>>      }
>>>      .ls <- as.data.frame(rbind(as.matrix(.result), `**Total` =
>>> sum(.result)))
>>>      names(.ls) <- "Size"
>>>      .ls$Size <- formatC(.ls$Size, big.mark = ",", digits = 0,
>>>          format = "f")
>>>      .ls$Class <- c(unlist(lapply(rownames(.ls)[-nrow(.ls)],
>>> function(x) class(eval(as.symbol(x),
>>>          envir = envir))[1L])), "-------")
>>>      .ls$Length <- c(unlist(lapply(rownames(.ls)[-nrow(.ls)],
>>>          function(x) length(eval(as.symbol(x), envir = envir)))),
>>>          "-------")
>>>      .ls$Dim <- c(unlist(lapply(rownames(.ls)[-nrow(.ls)], function(x)
>>> paste(dim(eval(as.symbol(x),
>>>          envir = envir)), collapse = " x "))), "-------")
>>>      .ls
>>> }
>>>
>>>
>>> which gives output like this:
>>>
>>>> my.ls()
>>>                   Size       Class  Length     Dim
>>> .Last             736    function       1
>>> .my.env.jph        28 environment      39
>>> x                 424     integer     100
>>> y              40,024     integer   10000
>>> z           4,000,024     integer 1000000
>>> **Total     4,041,236     ------- ------- -------
>>>
>>>
>>> On Fri, Dec 21, 2012 at 8:03 AM, Peter Meißner
>>> <peter.meissner at uni-konstanz.de> wrote:
>>>> Thanks for your answer,
>>>>
>>>> yes, I tried 'gc()' it did not change the bahavior.
>>>>
>>>> best, Peter
>>>>
>>>>
>>>> Am 21.12.2012 13:37, schrieb jim holtman:
>>>>>
>>>>> have you tried putting calls to 'gc' at the top of the first loop to
>>>>> make sure memory is reclaimed? You can print the call to 'gc' to see
>>>>> how fast it is growing.
>>>>>
>>>>> On Thu, Dec 20, 2012 at 6:26 PM, Peter Meissner
>>>>> <peter.meissner at uni-konstanz.de> wrote:
>>>>>>
>>>>>> Hey,
>>>>>>
>>>>>> I have an double loop like this:
>>>>>>
>>>>>>
>>>>>> chunk <- list(1:10, 11:20, 21:30)
>>>>>> for(k in 1:length(chunk)){
>>>>>>           print(chunk[k])
>>>>>>           DummyCatcher <- NULL
>>>>>>           for(i in chunk[k]){
>>>>>>                   print("i load something")
>>>>>>                   dummy <- 1
>>>>>>                   print("i do something")
>>>>>>                   dummy <- dummy + 1
>>>>>>                   print("i do put it together")
>>>>>>                   DummyCatcher = rbind(DummyCatcher, dummy)
>>>>>>           }
>>>>>>           print("i save a chunk and restart with another chunk of
>>>>>> data")
>>>>>> }
>>>>>>
>>>>>> The problem now is that with each 'chunk'-cycle the memory used by R
>>>>>> becomes
>>>>>> bigger and bigger until it exceeds my RAM but the RAM it needs for
>>>>>> any of
>>>>>> the chunk-cycles alone is only a 1/5th of what I have overall.
>>>>>>
>>>>>> Does somebody have an idea why this behaviour might occur? Note
>>>>>> that all
>>>>>> the
>>>>>> objects (like 'DummyCatcher') are reused every cycle so that I would
>>>>>> assume
>>>>>> that the RAM used should stay about the same after the first 'chunk'
>>>>>> cycle.
>>>>>>
>>>>>>
>>>>>> Best, Peter
>>>>>>
>>>>>>
>>>>>> SystemInfo:
>>>>>>
>>>>>> R version 2.15.2 (2012-10-26)
>>>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>>>> Win7 Enterprise, 8 GB RAM
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Peter Meißner
>>>> Workgroup 'Comparative Parliamentary Politics'
>>>> Department of Politics and Administration
>>>> University of Konstanz
>>>> Box 216
>>>> 78457 Konstanz
>>>> Germany
>>>>
>>>> +49 7531 88 5665
>>>> http://www.polver.uni-konstanz.de/sieberer/home/
>>>
>>>
>>>
>>
>

-- 
Peter Meißner
Workgroup 'Comparative Parliamentary Politics'
Department of Politics and Administration
University of Konstanz
Box 216
78457 Konstanz
Germany

+49 7531 88 5665
http://www.polver.uni-konstanz.de/sieberer/home/




More information about the R-help mailing list