[Rd] Significant memory leak when using XML on Windows

Janko Thyson janko.thyson at gmail.com
Mon Dec 15 13:02:58 CET 2014


@Jeroen: nope, seems like the problem unfortunately persists:

require("XML")
getTaskMemoryByPid <- function(
  pid = Sys.getpid()
) {
  cmd <- sprintf("tasklist /FI \"pid eq %s\" /FO csv", pid)
  mem <- read.csv(text=shell(cmd, intern = TRUE),
stringsAsFactors=FALSE)[,5]
  mem <- as.numeric(gsub("\\.|\\s|K", "", mem))/1000
  mem
}
getCurrentMemoryStatus <- function() {
  mem_os  <- getTaskMemoryByPid()
  mem_r   <- memory.size()
  prof_1  <- memory.profile()
  list(r = mem_r, os = mem_os, ratio = mem_os/mem_r)
}
memoryLeak <- function(
  x = system.file("exampleData", "mtcars.xml", package="XML"),
  n = 5000,
  free_doc = FALSE,
  rm_doc = FALSE,
  use_gc = FALSE
) {
  lapply(1:n, function(ii) {
    doc <- xmlParse(x)
    if (free_doc) free(doc)
    if (rm_doc) rm(doc)
    if (use_gc) gc()
    NULL
  })
}
mem_1 <- getCurrentMemoryStatus()
memoryLeak(n = 50000, free_doc = TRUE, rm_doc = TRUE)
mem_2 <- getCurrentMemoryStatus()

> rbind(data.frame(mem_1), data.frame(mem_2))

      r      os    ratio
1 63.65  87.148 1.369175
2 97.63 122.160 1.251255



On Mon, Dec 15, 2014 at 12:25 PM, Janko Thyson <janko.thyson at gmail.com>
wrote:
>
> Sorry guys, didn't see your responses before sending mine.
>
> Thanks jeroen!! I'll test your version today and get back to you.
>
> Gesendet von meinem Smartphone
> Am 15.12.2014 12:12 schrieb "Janko Thyson" <janko.thyson at gmail.com>:
>
> > Thanks a lot for answering. Before I get into it, please note that
> > everything below bears the big capture "Thanks for trying to help me at
> > all".
> >
> > 1) Yeah, those examples - quite hard to satisfy everyone's needs ;-)
> While
> > the one side complained that my past examples regarding this issue were
> not
> > informative enough, others didn't like the more elaborated version (as
> > seems to be the case for you). I simply tried to make it as easy as
> > possible for people to see what's actually going on so they wouldn't have
> > to program their own stuff for things like reading the actual memory
> > consumed by the Rterm process etc.. If you prefer plain vanilla, though,
> I
> > guess this would be it:
> >
> > memoryLeak <- function(
> >   x = system.file("exampleData", "mtcars.xml", package="XML"),
> >   n = 5000,
> >   free_doc = FALSE,
> >   rm_doc = FALSE,
> >   use_gc = FALSE
> > ) {
> >   lapply(1:n, function(ii) {
> >     doc <- xmlParse(x)
> >     if (free_doc) free(doc)
> >     if (rm_doc) rm(doc)
> >     if (use_gc) gc()
> >     NULL
> >   })
> > }
> >
> > 2) If I knew my way around OSX or Linux, I would be happy to go with your
> > suggestions - but as I'm not, unfortunately that's out of reach for me.
> But
> > IMO, a deeper level of cross-platform expertise should **not** be a
> > generall prerequisite before you can ask for help - even at r-devel (as
> > opposed to r-help). However, AFAIK from past conversations with Duncan,
> the
> > problem is indeed Windows-specific as on all his non-Windows
> infrastructure
> > (definitely Linux, possibly OSX), everything went fine.
> >
> > 3) The same goes for the level of expertise in C. After all, R is not C.
> I
> > totally agree that the more programming languages one knows, the better.
> > But again: I don't think that knowing your way around C should be a
> > prerequisite for asking for help when an *R function* interfacing C
> causes
> > trouble. Requesting this would sort of oppose R's nature/paradigm of
> being
> > an awesome "top-level" interfacing language. But I'll try to narrow the
> > problem down on a C-level if I can help you with that.
> >
> > 4) Both Duncan as well as Hadley have suggested that libxml2 is indeed
> > causing the problem. So trying to link against another build would
> possibly
> > be a great way to start! How would I go about that?
> >
> > Thanks if you should take the time to further look into this!
> > Janko
> >
> > On Mon, Dec 15, 2014 at 4:54 AM, Jeroen Ooms <jeroenooms at gmail.com>
> wrote:
> >>
> >> On Thu, Dec 11, 2014 at 12:13 PM, Janko Thyson <janko.thyson at gmail.com>
> >> wrote:
> >>>
> >>> I'd so much appreciate if someone could have a look at this. If I can
> be
> >>> of
> >>> any help whatsoever, please let me know!
> >>>
> >>
> >> Your current code uses various functions from XML and rvest so it is not
> >> a *minimal* reproducible example. Even if you are unfamiliar with C, you
> >> should be able to investigate exactly which function in the XML package
> you
> >> think has issues. Once you found the problematic R function, inspect the
> >> source code or use debug() to see if you can narrow it down even
> further,
> >> preferably to a particular call to C.
> >>
> >> Moreover you should create a reproducible example that allows us (and
> >> you) to test if this problem appears on other systems such as OSX or
> linux.
> >> Development and debugging on Windows is very painful so your
> windows-only
> >> example is not too helpful. Making people use windows is not a good
> >> strategy for getting help.
> >>
> >> If the "leak" does not appear on other systems, it is likely a problem
> in
> >> the libxml2 windows library on cran. In that case we can try to link
> >> against another build. On the other hand, if the problem does appear
> across
> >> systems, and you have provided a minimal reproducible example that
> >> pinpoints the problematic C function, we can help you review/debug the
> code
> >> C to see if/where some allocated object is not properly freed.
> >>
> >>
> >>
> >>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list