[R] xmlOutputBuffer vs xmlOutputDOM

Duncan Temple Lang duncan at wald.ucdavis.edu
Sun Jul 8 20:13:32 CEST 2007


Hi Arjun

Have you tried using xmlTree() which uses an opaque
C representation of the document and I expect
will serialize the contents relatively rapidly.
The interface for creating the tree is intended to be
the same, and is at least similar to, as xmlOutputDOM.
The intent is that the representations are easily interchangeable.

xmlOutputDOM is slow because it is representing a tree in R as
a list of lists. You might also use xmlHashTree() which uses
a more efficient representation in R.  But the nature of
the C representation (and not just the fact that it uses C code)
will probably speed things up considerably.


A question that comes to mind is why you really care about
pretty printing of the resulting document if it is very large?
Will a human read it?  If so and it is just for verifying it is correct,
read it back into R and validate the contents programmatically.


  D.


Arjun Ravi Narayan wrote:
> Hi,
> 
> I am trying to use the XML package to write some data (pretty large
> amounts of data) into XML files. I experimented with a few variations,
> using xmlOutputBuffer and xmlOutputDOM.
> 
> xmlOutputDOM provides neat formatted, indented output, but takes very
> long. xmlOutputBuffer is incompatible (in my experiences) with the
> saveXML function, and so i hacked around it by outputting its $value()
> to cat. This unfortunately makes it lose all proper formatting, and so
> gives me an XML file with new lines after every tag or entry, and with
> no indenting at all.
> 
> However, xmlOutputDOM takes very long - I am outputting rather large
> files, and where xmlOutputBuffer takes about 10-15 seconds,
> xmlOutputDOM takes about 20 minutes.
> 
> Am I using xmlOutputDOM in some wrong way? Is there a way to get
> proper formatting out of xmlOutputBuffer? Either of these solutions
> would be useful, as I see no advantage to using one over the other for
> just outputting lots of data (>10000 fields at minimum)
> 
> Below is my code, and after that, an output of the times that were
> reported on a sample run:
> 
> 
> library(XML)
> 
> buffer <- xmlOutputBuffer()
> buffer2 <- xmlOutputDOM()
> 
> buffer$addTag("outside", close = FALSE)
> buffer2$addTag("outside", close = FALSE)
> 
> for(i in 1:1000) {
>  buffer$addTag("tag", i)
>  buffer2$addTag("tag", i)
> }
> 
> buffer$closeTag()
> buffer2$closeTag()
> 
> system.time(cat(buffer$value(), file = "foo2.xml"))
> system.time(saveXML(buffer2$value(), file = "foo.xml"))
> 
> 
> Times reported : the xmlOutputDOM is more than 100x slower.
> 
>> system.time(cat(buffer$value(), file = "foo2.xml"))
>   user  system elapsed
>  0.004   0.000   0.001
>> system.time(saveXML(buffer2$value(), file = "foo.xml"))
>   user  system elapsed
>  0.476   0.024   0.516
> 
> 
> I am using R version 2.5.1, and XML package version 1.9-0
> 
> 
> Yours sincerely,
> Arjun Ravi Narayan
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list