[R] Parsing very large xml datafiles with SAX (XML package): What data structure should I favor?

R. Michael Weylandt <michael.weylandt@gmail.com> michael.weylandt at gmail.com
Fri Oct 26 23:02:18 CEST 2012


I'd look into the data.table package. 

Cheers,
RMW

On Oct 26, 2012, at 6:00 PM, Frederic Fournier <frederic.bioinfo at gmail.com> wrote:

> Hello again,
> 
> I have another question related to parsing a very large xml file with SAX:
> what kind of data structure should I favor? Unlike using DOM function that
> can return lists of relevant nodes and let me use various versions of
> 'apply', the SAX parsing returns me one thing at a time.
> 
> I first tried to simply append to simple solution of appending to lists as
> I get the data. But I very soon realized that this is way too slow.
> Then I tried pre-declaring large data.frames of NA and populating them with
> [[<-.data.frame. But this is quite slow too.
> I then tried pre-declaring large matrix of NA and populating them with [<-.
> This is better... but still unmanageable as xml files become large.
> I also tried using an environment as a hash structure:
> 
> , but realized that this is simple on the programmer, but stalls the
> parsing.
> I then tried to
> 
>    [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list