[R] Parsing XML?

Ivan Krylov kry|ov@r00t @end|ng |rom gm@||@com
Thu Jul 28 08:49:22 CEST 2022


On Wed, 27 Jul 2022 15:50:55 -0500
Spencer Graves <spencer.graves using effectivedefense.org> wrote:

> What would you suggest I do to parse the following XML file into a
> list that I can understand:
> 
> XMLfile <-
> "https://chroniclingamerica.loc.gov/data/bib/worldcat_titles/bulk5/ndnp_Alabama_all-yrs_e_0001_0050.xml" 

> XMLdat <- XML::xmlParse(XMLdata)
> str(XMLdat)

Isn't XMLdat already a tree-like list? For example,
XMLdat[[1]][[1]][[3]][[1]] is the first <record> tag in the file, which
you can further pick apart.

What information do you need from this file and how would you like to
access it? Parsing XML files is typically achieved with XPath
expressions (e.g. 'under every <record> tag, extract the <datafield>
tags containing attribute tag="042"' would look like
'record/datafield[tag="042"]') and/or handlers on specific tags, not by
extracting all text nodes and performing string operations on them.

-- 
Best regards,
Ivan



More information about the R-help mailing list