[R] XML - get node by name

Duncan Temple Lang dtemplelang at ucdavis.edu
Sun Sep 7 16:17:47 CEST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Antje

Well, the XML package gives you a variety of ways to parse
an XML document and manipulate it in R.
Perhaps the approach that best matches the Java-style you
outline is to use XPath to access nodes.
To do this, you use
  doc = xmlTreeParse("filename.xml", useInternalNodes = TRUE)

and then access the elements of interest with XPath queries, e.g.
to get the value of the second <val> element within each <data>
element, use

  xpathApply(doc, "//data", function(n) xmlValue(n[[2]]))

To get the first <val> node in the first <data> you could use

  doc[ "//data/val" ] [[1]]

or

  doc[[ "//data[1]/val[1]" ]]


(Note the indexing/subsetting is being done in different languages.)


Being able to access a node by just its name is convenient,
but it may not be adequate. You may pick up too many matching nodes.
So XPath is a powerful way to be able to use simplicity when it is
adequate and more explicit constrantts on the path when more
specificity is necessary.  And XPath is a widespread standard
mechanism for XML rather than specific to R or Java.

HTH,

  D.


Antje wrote:
> Hi there,
> 
> I try to rewrite some Java-code with R. It deals with reading XML files.
> I started with the XML package. In Java, I had a very useful method
> which gave me a node by using:
> 
> name of the node
> index of appearance
> start point: global (false) / local (true)
> 
> So, I could do something like this.
> 
> setCurrentChildNode("data", 0);
> getValueOfElement("val",1,true);
> --> gives 45
> 
> setCurrentChildNode("data", 1);
> getValueOfElement("val",1,true);
> --> gives 11
> 
> getValueOfElement("val",1,false);
> --> gives 45
> 
> <root>
>   <data loc="1">
>     <val i="t1"> 22 </val>
>     <val i="t2"> 45 </val>
>   </data>
>   <data loc="2">
>     <val i="t1"> 44 </val>
>     <val i="t2"> 11 </val>
>   </data>
> </root>
> 
> Now, I'd like to do something like this in R. Most important would be to
> retrieve a node just by its name, not by the whole path. How is it
> possible?
> 
> Can anybody help me with this issue?
> 
> Antje
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjD4osACgkQ9p/Jzwa2QP7ZUACfYpsezY4T2AeKb3G7Jo6Vr0N0
RmwAnAtKCY5s8vBoDx7C1DFP24eveCtk
=XWJ8
-----END PGP SIGNATURE-----



More information about the R-help mailing list