[R] another XML package question

Antje niederlein-rstat at yahoo.de
Mon Sep 8 17:33:52 CEST 2008


Duncan Temple Lang schrieb:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> 
> 
> Antje wrote:
>> Hi Duncan,
>>
>> thanks a lot for your explanations.
>>
>> I tried the following now to understand a bit more:
>>
>> data <- getNodeSet(doc, "//Data")
>> xmlName(data[[1]])
>> xmlName(xmlRoot(data[[1]]))
>> xpathApply(data[[1]], "./*", xmlName)
>>
>> Is it right that using "data" in the xpathApply() somehow sets the
>> current node but does not change the root?
> 
> The answer is "it depends", specifically on what version of
> the XML package you have.
> In version 1.96-0 (the latest release), yes.
> There is code also in the package (but overriden)
> that creates a new temporary tree with the given node as the
> root of the new tree (but without copying the nodes).
> But the former is most likely what is desired.
> 
>> So looking for a subnode at all levels below my current node is not
>> possible with the xPath syntax? 
> 
> It is possible
> 
>   getNodeSet( data[[1]], ".//*")


allright, I didn't try this (I assumed that the // means "everything below 
root"...)
Now, I can do what I was looking for.

Thanks a lot for everything!

> 
> does that. The // means "any level". BTW, it doesn't match text
> nodes, so you might want
>           ".//*|.//text()|.//processing-instruction()"
> for completeness (or maybe not!)
> 
> The key thing is that when you supply a node (and not the document)
> as the first argument of getNodeSet() or xpathApply(), the XPath
> query should be a relative query, e.g. .//* rather than //*.
> 
> And the reason for keeping the root the same is so that we can do
> 
>   getNodeSet(data[[1]], "ancestor::*")
> or
>   getNodeSet(data[[1]], "../foo")
> 
> i.e. have an XPath expression that refers to nodes "higher" up the tree.
> 
>  D.
> 
>> (search on all levels starting from root
>> is possible with "//nodename")
>>
>> Antje
>>
>>
>>
>>
>> Duncan Temple Lang schrieb:
>>
>>
>> Antje wrote:
>>>>> Hi there,
>>>>>
>>>>> does anybody know how to return the xmlPath from a node?
>>>>> For example, at several location in the xml file, I have nodes with the
>>>>> same name and I'd like to process only the nodes from a certain path.
>>>>>
>>>>> Any idea?
>> As with your previous question, there are ways to do this
>> with either XPath queries or R functions that operate on
>> the nodes from the earlier queries.
>>
>> By "xmlPath", let's assume you mean the ordered collection of
>> nodes from the node to the root node of the document,
>> i.e. the collection of ancestor nodes.
>> So using XPath, you could use
>>
>>    a = getNodeSet( node, "ancestor::*")
>>
>> where node is the R variable containing the node within the tree
>> whose ancestors you want, e.g.
>>     getNodeSet(doc, "//val")[[1]]
>>
>> The nodes in are in "reverse" order.
>>
>>
>> You can do the same thing with the R function
>> xmlParent().  To get the ancestors,
>>
>>   tmp = xmlParent(node)
>>   ans = list()
>>   while( !is.null(tmp)) {
>>       ans = c(ans, tmp)
>>       tmp = xmlParent(tmp)
>>   }
>>
>> and of course in your case you could terminate the loop
>> at any point.
>>
>>
>> But a different approach to the problem is to use a more specific
>> XPath query in the first place to get only the nodes of interest.
>> For example, to get the <val> nodes in the second <data> node of
>> your example, you could use
>>
>>   getNodeSet(doc, "//data[2]/val")
>>
>> or to find all <val> nodes which have the attribute  i = "t2",
>>
>>    getNodeSet(doc, "//val[@i='t2']")
>>
>> Or to find all <val> nodes with an ancestor which have an ancestor
>> with an attribute name "loc"
>>
>>      getNodeSet(doc, "//*[@loc='1']//val")
>>
>>
>>
>> (
>> The  sample XML document was
>>
>> <root>
>>    <data loc="1">
>>      <val i="t1"> 22 </val>
>>      <val i="t2"> 45 </val>
>>    </data>
>>    <data loc="2">
>>      <val i="t1"> 44 </val>
>>      <val i="t2"> 11 </val>
>>    </data>
>> </root>
>>
>> )
>>
>>
>>  D.
>>
>>>>> Antje
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (Darwin)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> 
> iEYEARECAAYFAkjFQLMACgkQ9p/Jzwa2QP5mSwCffr3WDFAAvEQ+PDhIl65R8uQb
> EvUAn0bHeUqZSKQzUlDO4qaCV69tMuNg
> =y6Eo
> -----END PGP SIGNATURE-----
>



More information about the R-help mailing list