[R] Extracting XML value

Ben Tupper btupper at bigelow.org
Thu Sep 3 22:41:14 CEST 2015


Hi,

You are very close and your understanding is correct - you need to extract the root node from the XMLDocument returned from xmlTreeParse.

library(XML)

txt <-  "<?xml version=\"1.0\" encoding=\"utf-8\" ?>\n<observations realtime_start=\"2015-09-03\" realtime_end=\"2015-09-03\" observation_start=\"2015-09-01\" observation_end=\"2015-09-01\" units=\"lin\" output_type=\"1\" file_type=\"xml\" order_by=\"observation_date\" sort_order=\"asc\" count=\"1\" offset=\"0\" limit=\"100000\">\n  <observation realtime_start=\"2015-09-03\" realtime_end=\"2015-09-03\" date=\"2015-09-01\" value=\"0.46\"/>\n</observations>\n\n\n\n"

# parse the text tree and extract the root node
obs <- xmlRoot(xmlTreeParse(txt, useInternalNodes = TRUE, asText = TRUE))

# get the first child node of 'observation' name.  Yes, there is just one.
obs1 <- obs['observation'][[1]]

# it has no value, just attributes of which 'value' is one
xmlAttrs(obs1)[['value']]


Cheers,
Ben



On Sep 3, 2015, at 11:23 AM, Glenn Schultz <glennmschultz at me.com> wrote:

> All,
> 
> I have made it as far as generating an api call which returns the following xml
> [1] "<?xml version=\"1.0\" encoding=\"utf-8\" ?>\n<observations realtime_start=\"2015-09-03\" realtime_end=\"2015-09-03\" observation_start=\"2015-09-01\" observation_end=\"2015-09-01\" units=\"lin\" output_type=\"1\" file_type=\"xml\" order_by=\"observation_date\" sort_order=\"asc\" count=\"1\" offset=\"0\" limit=\"100000\">\n  <observation realtime_start=\"2015-09-03\" realtime_end=\"2015-09-03\" date=\"2015-09-01\" value=\"0.46\"/>\n</observations>\n\n\n\n"
> attr(,"Content-Type")
>               charset 
> "text/xml"    "UTF-8" 
> 
> following DTL's presentation on the Berkley site and the package help I parsed the xml
> 
> doc = xmlTreeParse(USSW10, asText = TRUE, useInternal = TRUE)
> 
> which gives
> <?xml version="1.0" encoding="utf-8"?>
> <observations realtime_start="2015-09-03" realtime_end="2015-09-03" observation_start="2015-09-01" observation_end="2015-09-01" units="lin" output_type="1" file_type="xml" order_by="observation_date" sort_order="asc" count="1" offset="0" limit="100000">
>   <observation realtime_start="2015-09-03" realtime_end="2015-09-03" date="2015-09-01" value="0.46"/>
> </observations>
> 
> finally I try to extract the value 0.46 using the xmlValue function.  I have lost something in translation and I am unable to extract the value.  my understanding is I have one node with no children, correct?
> 
> -Glenn
> 
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Ben Tupper
Bigelow Laboratory for Ocean Sciences
60 Bigelow Drive, P.O. Box 380
East Boothbay, Maine 04544
http://www.bigelow.org



More information about the R-help mailing list