[Rd] Significant memory leak when using XML on Windows

Janko Thyson janko.thyson at gmail.com
Mon Dec 15 12:12:00 CET 2014

Thanks a lot for answering. Before I get into it, please note that
everything below bears the big capture "Thanks for trying to help me at

1) Yeah, those examples - quite hard to satisfy everyone's needs ;-) While
the one side complained that my past examples regarding this issue were not
informative enough, others didn't like the more elaborated version (as
seems to be the case for you). I simply tried to make it as easy as
possible for people to see what's actually going on so they wouldn't have
to program their own stuff for things like reading the actual memory
consumed by the Rterm process etc.. If you prefer plain vanilla, though, I
guess this would be it:

memoryLeak <- function(
  x = system.file("exampleData", "mtcars.xml", package="XML"),
  n = 5000,
  free_doc = FALSE,
  rm_doc = FALSE,
  use_gc = FALSE
) {
  lapply(1:n, function(ii) {
    doc <- xmlParse(x)
    if (free_doc) free(doc)
    if (rm_doc) rm(doc)
    if (use_gc) gc()

2) If I knew my way around OSX or Linux, I would be happy to go with your
suggestions - but as I'm not, unfortunately that's out of reach for me. But
IMO, a deeper level of cross-platform expertise should **not** be a
generall prerequisite before you can ask for help - even at r-devel (as
opposed to r-help). However, AFAIK from past conversations with Duncan, the
problem is indeed Windows-specific as on all his non-Windows infrastructure
(definitely Linux, possibly OSX), everything went fine.

3) The same goes for the level of expertise in C. After all, R is not C. I
totally agree that the more programming languages one knows, the better.
But again: I don't think that knowing your way around C should be a
prerequisite for asking for help when an *R function* interfacing C causes
trouble. Requesting this would sort of oppose R's nature/paradigm of being
an awesome "top-level" interfacing language. But I'll try to narrow the
problem down on a C-level if I can help you with that.

4) Both Duncan as well as Hadley have suggested that libxml2 is indeed
causing the problem. So trying to link against another build would possibly
be a great way to start! How would I go about that?

Thanks if you should take the time to further look into this!

On Mon, Dec 15, 2014 at 4:54 AM, Jeroen Ooms <jeroenooms at gmail.com> wrote:
> On Thu, Dec 11, 2014 at 12:13 PM, Janko Thyson <janko.thyson at gmail.com>
> wrote:
>> I'd so much appreciate if someone could have a look at this. If I can be
>> of
>> any help whatsoever, please let me know!
> Your current code uses various functions from XML and rvest so it is not a
> *minimal* reproducible example. Even if you are unfamiliar with C, you
> should be able to investigate exactly which function in the XML package you
> think has issues. Once you found the problematic R function, inspect the
> source code or use debug() to see if you can narrow it down even further,
> preferably to a particular call to C.
> Moreover you should create a reproducible example that allows us (and you)
> to test if this problem appears on other systems such as OSX or linux.
> Development and debugging on Windows is very painful so your windows-only
> example is not too helpful. Making people use windows is not a good
> strategy for getting help.
> If the "leak" does not appear on other systems, it is likely a problem in
> the libxml2 windows library on cran. In that case we can try to link
> against another build. On the other hand, if the problem does appear across
> systems, and you have provided a minimal reproducible example that
> pinpoints the problematic C function, we can help you review/debug the code
> C to see if/where some allocated object is not properly freed.

	[[alternative HTML version deleted]]

More information about the R-devel mailing list