[R] Failure to understand namespaces in XML::getNodeSet

Mark Sharp msharp at txbiomed.org
Wed Feb 1 00:36:59 CET 2017


Hadley,

Thank you. I am able to get the xml_ns_strip() function to work with my file directly so I will likely be able to reach my immediate goal.

However, I still have had no success with understanding the namespace problem. I am not able to use read_xml() using the object I generated for the reproducible example, which is simply a character vector of length 4 having the contents of the XML file as produce by readLines(). I then used dput() to define the structure. The resulting structure apparently is not to the liking of read_xml(). I have reproduced the necessary code here for your convenience. There error is below.

##
library(xml2)
library(stringr)
with_ns_xml <- c("<?xml version=\"1.0\" ?>",
                 "<WorkSet xmlns=\"http://labkey.org/etl/xml\">",
                 "<Description>MFIA 9-Plex (CharlesRiver)</Description>",
                 "</WorkSet>")
## without str_c() collapse it complain of a vector of length > 1 also.
read_xml(str_c(with_ns_xml, collapse = TRUE))
Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html = as_html,  :
  Start tag expected, '<' not found [4]

## produces the following error message.
Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html = as_html,  :
  Start tag expected, '<' not found [4]

I have similar issues with xml2::xml_find_all
xml_find_all(str_c(with_ns_xml, collapse = TRUE), "/WorkSet//Description")

## Produces the following error message.
Error in UseMethod("xml_find_all") :
  no applicable method for 'xml_find_all' applied to an object of class "character"



R. Mark Sharp, Ph.D.
msharp at TxBiomed.org





> On Jan 31, 2017, at 4:27 PM, Hadley Wickham <h.wickham at gmail.com> wrote:
>
> See the last example in ?xml2::xml_find_all or use xml2::xml2::xml_ns_strip()
>
> Hadley
>
> On Tue, Jan 31, 2017 at 9:43 AM, Mark Sharp <msharp at txbiomed.org> wrote:
>> I am trying to read a series of XML files that use a namespace and I have failed, thus far, to discover the proper syntax. I have a reproducible example below. I have two XML character strings defined: one without a namespace and one with. I show that I can successfully extract the node using the XML string without the namespace and fail when using the XML string with the namespace.
>>
>> Mark
>> PS I am having the same problem with the xml2 package and am hoping understanding one with help with the other.
>>
>> ##
>> library(XML)
>> ## The first XML text (no_ns_xml) does not have a namespace defined
>> no_ns_xml <- c("<?xml version=\"1.0\" ?>", "<WorkSet>",
>>               "<Description>MFIA 9-Plex (CharlesRiver)</Description>",
>>               "</WorkSet>")
>> l_no_ns_xml <-xmlTreeParse(no_ns_xml, asText = TRUE, getDTD = FALSE,
>>                           useInternalNodes = TRUE)
>> ## The node is found
>> getNodeSet(l_no_ns_xml, "/WorkSet//Description")
>>
>> ## The second XML text (with_ns_xml) has a namespace defined
>> with_ns_xml <- c("<?xml version=\"1.0\" ?>",
>>                 "<WorkSet xmlns=\"http://labkey.org/etl/xml\">",
>>                 "<Description>MFIA 9-Plex (CharlesRiver)</Description>",
>>                 "</WorkSet>")
>>
>> l_with_ns_xml <-xmlTreeParse(with_ns_xml, asText = TRUE, getDTD = FALSE,
>>                               useInternalNodes = TRUE)
>> ## The node is not found
>> getNodeSet(l_with_ns_xml, "/WorkSet//Description")
>> ## I attempt to provide the namespace, but fail.
>> ns <-  "http://labkey.org/etl/xml"
>> names(ns)[1] <- "xmlns"
>> getNodeSet(l_with_ns_xml, "/WorkSet//Description", namespaces = ns)
>>
>> R. Mark Sharp, Ph.D.
>> Director of Data Science Core
>> Southwest National Primate Research Center
>> Texas Biomedical Research Institute
>> P.O. Box 760549
>> San Antonio, TX 78245-0549
>> Telephone: (210)258-9476
>> e-mail: msharp at TxBiomed.org
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> CONFIDENTIALITY NOTICE: This e-mail and any files and/or...{{dropped:10}}
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> http://hadley.nz

CONFIDENTIALITY NOTICE: This e-mail and any files and/or...{{dropped:10}}



More information about the R-help mailing list