[R] memory leak using XML readHTMLTable

J Toll jctoll at gmail.com
Mon Sep 17 05:30:42 CEST 2012


Hi,

I'm using the XML package to scrape data and I'm trying to figure out
how to eliminate the memory leak I'm currently experiencing.  In the
searches I've done, it sounds like the existence of the leak is fairly
well known.  What isn't as clear is exactly how to solve it.  The
general process I'm using is this:

require(XML)

myFunction <- function(URL) {

  html <- readLines(URL)

  tables <- readHTMLTable(html, stringsAsFactors = FALSE)

  myData <- data.frame(Value = tables[[1]][, 2],
                                row.names = make.unique(tables[[1]][, 1]),
                                stringsAsFactors = FALSE)

 rm(list = c("html", "tables"))   # here, and
 free(tables)                          # here, my attempt to solve the
memory leak

  return(myData)

}

x <- lapply(myURLs, myFunction)


I've tried using rm() and free() to try to free up the memory each
time the function is called, but it hasn't worked as far as I can
tell.  By the time lapply is finished woking through my list of url's,
I'm swapping about 3GB of memory.

I've also tried using gc(), but that seems to also have no effect on
the problem.

I'm running RStudio 0.96.330 and latest version of XML.
R version 2.15.1 (2012-06-22) -- "Roasted Marshmallows"
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

Any suggestions on how to solve this memory issue?  Thanks.


James



More information about the R-help mailing list