[R] Memory filling up while looping

Peter Meißner peter.meissner at uni-konstanz.de
Fri Dec 21 16:27:41 CET 2012


Here is an working example that reproduces the behavior by creating 1000 
xml-files and afterwards parsing them.

At my PC, R starts with about 90MB of RAM with every cycle another 
10-12MB are further added to the RAM-usage so I end up with 200MB RAM 
usage.

In the real code one chunk-cycle eats about 800MB of RAM which was one 
of the reasons I decided to splitt up the process in seperate chunks in 
the first place.

----------------
'Minimal'Example - START
----------------

# the general problem
require(XML)

chunk <- function(x, chunksize){
             # source: http://stackoverflow.com/a/3321659/1144966
             x2 <- seq_along(x)
             split(x, ceiling(x2/chunksize))
         }



chunky <- chunk(paste("test",1:1000,".xml",sep=""),100)

for(i in 1:1000){
     writeLines(c(paste('<?xml version="1.0"?>\n <note>\n 
<to>Tove</to>\n    <nr>',i,'</nr>\n    <from>Jani</from>\n 
<heading>Reminder</heading>\n    ',sep=""), paste(rep('<body>Do not 
forget me this weekend!</body>\n',sample(1:10, 1)),sep="" ) , ' </note>')
     ,paste("test",i,".xml",sep=""))
}

for(k in 1:length(chunky)){
     gc()
     print(chunky[[k]])
     xmlCatcher <- NULL

     for(i in 1:length(chunky[[k]])){
         filename    <- chunky[[k]][i]
         xml         <- xmlTreeParse(filename)
         xml         <- xmlRoot(xml)
         result      <- sapply(getNodeSet(xml,"//body"), xmlValue)
         id          <- sapply(getNodeSet(xml,"//nr"), xmlValue)
         dummy       <- cbind(id,result)
         xmlCatcher  <- rbind(xmlCatcher,dummy)
         }
     save(xmlCatcher,file=paste("xmlCatcher",k,".RData"))
}

----------------
'Minimal'Example - END
----------------



Am 21.12.2012 15:14, schrieb jim holtman:
> Can you send either your actual script or the console output so I can
> get an idea of how fast memory is growing.  Also at the end, can you
> list the sizes of the objects in the workspace.  Here is a function I
> use to get the space:
>
> my.ls <-
> function (pos = 1, sorted = FALSE, envir = as.environment(pos))
> {
>      .result <- sapply(ls(envir = envir, all.names = TRUE),
> function(..x) object.size(eval(as.symbol(..x),
>          envir = envir)))
>      if (length(.result) == 0)
>          return("No objects to list")
>      if (sorted) {
>          .result <- rev(sort(.result))
>      }
>      .ls <- as.data.frame(rbind(as.matrix(.result), `**Total` = sum(.result)))
>      names(.ls) <- "Size"
>      .ls$Size <- formatC(.ls$Size, big.mark = ",", digits = 0,
>          format = "f")
>      .ls$Class <- c(unlist(lapply(rownames(.ls)[-nrow(.ls)],
> function(x) class(eval(as.symbol(x),
>          envir = envir))[1L])), "-------")
>      .ls$Length <- c(unlist(lapply(rownames(.ls)[-nrow(.ls)],
>          function(x) length(eval(as.symbol(x), envir = envir)))),
>          "-------")
>      .ls$Dim <- c(unlist(lapply(rownames(.ls)[-nrow(.ls)], function(x)
> paste(dim(eval(as.symbol(x),
>          envir = envir)), collapse = " x "))), "-------")
>      .ls
> }
>
>
> which gives output like this:
>
>> my.ls()
>                   Size       Class  Length     Dim
> .Last             736    function       1
> .my.env.jph        28 environment      39
> x                 424     integer     100
> y              40,024     integer   10000
> z           4,000,024     integer 1000000
> **Total     4,041,236     ------- ------- -------
>
>
> On Fri, Dec 21, 2012 at 8:03 AM, Peter Meißner
> <peter.meissner at uni-konstanz.de> wrote:
>> Thanks for your answer,
>>
>> yes, I tried 'gc()' it did not change the bahavior.
>>
>> best, Peter
>>
>>
>> Am 21.12.2012 13:37, schrieb jim holtman:
>>>
>>> have you tried putting calls to 'gc' at the top of the first loop to
>>> make sure memory is reclaimed? You can print the call to 'gc' to see
>>> how fast it is growing.
>>>
>>> On Thu, Dec 20, 2012 at 6:26 PM, Peter Meissner
>>> <peter.meissner at uni-konstanz.de> wrote:
>>>>
>>>> Hey,
>>>>
>>>> I have an double loop like this:
>>>>
>>>>
>>>> chunk <- list(1:10, 11:20, 21:30)
>>>> for(k in 1:length(chunk)){
>>>>           print(chunk[k])
>>>>           DummyCatcher <- NULL
>>>>           for(i in chunk[k]){
>>>>                   print("i load something")
>>>>                   dummy <- 1
>>>>                   print("i do something")
>>>>                   dummy <- dummy + 1
>>>>                   print("i do put it together")
>>>>                   DummyCatcher = rbind(DummyCatcher, dummy)
>>>>           }
>>>>           print("i save a chunk and restart with another chunk of data")
>>>> }
>>>>
>>>> The problem now is that with each 'chunk'-cycle the memory used by R
>>>> becomes
>>>> bigger and bigger until it exceeds my RAM but the RAM it needs for any of
>>>> the chunk-cycles alone is only a 1/5th of what I have overall.
>>>>
>>>> Does somebody have an idea why this behaviour might occur? Note that all
>>>> the
>>>> objects (like 'DummyCatcher') are reused every cycle so that I would
>>>> assume
>>>> that the RAM used should stay about the same after the first 'chunk'
>>>> cycle.
>>>>
>>>>
>>>> Best, Peter
>>>>
>>>>
>>>> SystemInfo:
>>>>
>>>> R version 2.15.2 (2012-10-26)
>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>> Win7 Enterprise, 8 GB RAM
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>>
>>>
>>
>> --
>> Peter Meißner
>> Workgroup 'Comparative Parliamentary Politics'
>> Department of Politics and Administration
>> University of Konstanz
>> Box 216
>> 78457 Konstanz
>> Germany
>>
>> +49 7531 88 5665
>> http://www.polver.uni-konstanz.de/sieberer/home/
>
>
>

-- 
Peter Meißner
Workgroup 'Comparative Parliamentary Politics'
Department of Politics and Administration
University of Konstanz
Box 216
78457 Konstanz
Germany

+49 7531 88 5665
http://www.polver.uni-konstanz.de/sieberer/home/




More information about the R-help mailing list