[R] Parsing very large xml datafiles with SAX: How to profile <anonymous> functions?

Duncan Temple Lang dtemplelang at ucdavis.edu
Sat Oct 27 02:01:07 CEST 2012


Hi Frederic

 Perhaps the simplest way to profile the individual functions in your
handlers is to write the individual handlers as regular
named functions, i.e. assigned to a variable in your work space (or function body)
and then two write the handler functions as wrapper functions that call these
by name

  startElement = function(name, attr, ...) {
     # code you want to run when we encounter the start of an XML element
  }

  myText = function(...) {
     # code
  }

  Now, when calling xmlEventParse()

   xmlEventParse(filename,
                  handlers = list(.startElement = function(...) startElement(...),
                                  .text = function(...) myText(...)))

Then the profiler will see the calls to startElement and myText.

There is small overhead of the extra layers, but you will get the profile information.

  D.

On 10/26/12 9:49 AM, Frederic Fournier wrote:
> Hello everyone,
> 
> I'm trying to parse a very large XML file using SAX with the XML package
> (i.e., mainly the xmlEventParsing function). This function takes as an
> argument a list of other functions (handlers) that will be called to handle
> particular xml nodes.
> 
> If when I use Rprof(), all the handler functions are lumped together under
> the <anonymous> label, and I get something like this:
> 
> $by.total
>                            total.time total.pct self.time self.pct
> "system.time"                  151.22     99.99      0.00     0.00
> "MyParsingFunction"            149.38     98.77      0.00     0.00
> "xmlEventParse"                149.38     98.77      0.00     0.00
> ".Call"                        149.32     98.73      3.04     2.01
> "<Anonymous>"                  146.74     97.02    141.26    93.40    <---
> !!
> "xmlValue"                       3.04      2.01      0.46     0.30
> "xmlValue.XMLInternalNode"       2.58      1.71      0.14     0.09
> "standardGeneric"                2.12      1.40      0.50     0.33
> "gc"                             1.86      1.23      1.86     1.23
> ...
> 
> 
> Is there a way to make Rprof() identify the different handler functions, so
> I can know which one might be a bottleneck? Is there another profiling tool
> that would be more appropriate in a case like this?
> 
> Thank you very much for your help!
> 
> Frederic
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
>




More information about the R-help mailing list