[Rd] Significant memory leak when using XML on Windows

Janko Thyson janko.thyson at gmail.com
Mon Dec 15 12:25:23 CET 2014

Sorry guys, didn't see your responses before sending mine.

Thanks jeroen!! I'll test your version today and get back to you.

Gesendet von meinem Smartphone
Am 15.12.2014 12:12 schrieb "Janko Thyson" <janko.thyson at gmail.com>:

> Thanks a lot for answering. Before I get into it, please note that
> everything below bears the big capture "Thanks for trying to help me at
> all".
> 1) Yeah, those examples - quite hard to satisfy everyone's needs ;-) While
> the one side complained that my past examples regarding this issue were not
> informative enough, others didn't like the more elaborated version (as
> seems to be the case for you). I simply tried to make it as easy as
> possible for people to see what's actually going on so they wouldn't have
> to program their own stuff for things like reading the actual memory
> consumed by the Rterm process etc.. If you prefer plain vanilla, though, I
> guess this would be it:
> memoryLeak <- function(
>   x = system.file("exampleData", "mtcars.xml", package="XML"),
>   n = 5000,
>   free_doc = FALSE,
>   rm_doc = FALSE,
>   use_gc = FALSE
> ) {
>   lapply(1:n, function(ii) {
>     doc <- xmlParse(x)
>     if (free_doc) free(doc)
>     if (rm_doc) rm(doc)
>     if (use_gc) gc()
>     NULL
>   })
> }
> 2) If I knew my way around OSX or Linux, I would be happy to go with your
> suggestions - but as I'm not, unfortunately that's out of reach for me. But
> IMO, a deeper level of cross-platform expertise should **not** be a
> generall prerequisite before you can ask for help - even at r-devel (as
> opposed to r-help). However, AFAIK from past conversations with Duncan, the
> problem is indeed Windows-specific as on all his non-Windows infrastructure
> (definitely Linux, possibly OSX), everything went fine.
> 3) The same goes for the level of expertise in C. After all, R is not C. I
> totally agree that the more programming languages one knows, the better.
> But again: I don't think that knowing your way around C should be a
> prerequisite for asking for help when an *R function* interfacing C causes
> trouble. Requesting this would sort of oppose R's nature/paradigm of being
> an awesome "top-level" interfacing language. But I'll try to narrow the
> problem down on a C-level if I can help you with that.
> 4) Both Duncan as well as Hadley have suggested that libxml2 is indeed
> causing the problem. So trying to link against another build would possibly
> be a great way to start! How would I go about that?
> Thanks if you should take the time to further look into this!
> Janko
> On Mon, Dec 15, 2014 at 4:54 AM, Jeroen Ooms <jeroenooms at gmail.com> wrote:
>> On Thu, Dec 11, 2014 at 12:13 PM, Janko Thyson <janko.thyson at gmail.com>
>> wrote:
>>> I'd so much appreciate if someone could have a look at this. If I can be
>>> of
>>> any help whatsoever, please let me know!
>> Your current code uses various functions from XML and rvest so it is not
>> a *minimal* reproducible example. Even if you are unfamiliar with C, you
>> should be able to investigate exactly which function in the XML package you
>> think has issues. Once you found the problematic R function, inspect the
>> source code or use debug() to see if you can narrow it down even further,
>> preferably to a particular call to C.
>> Moreover you should create a reproducible example that allows us (and
>> you) to test if this problem appears on other systems such as OSX or linux.
>> Development and debugging on Windows is very painful so your windows-only
>> example is not too helpful. Making people use windows is not a good
>> strategy for getting help.
>> If the "leak" does not appear on other systems, it is likely a problem in
>> the libxml2 windows library on cran. In that case we can try to link
>> against another build. On the other hand, if the problem does appear across
>> systems, and you have provided a minimal reproducible example that
>> pinpoints the problematic C function, we can help you review/debug the code
>> C to see if/where some allocated object is not properly freed.

	[[alternative HTML version deleted]]

More information about the R-devel mailing list