[R] Parsing XML File

Lorenzo Isella lorenzo.isella at gmail.com
Tue Oct 13 21:09:48 CEST 2015


Dear Jim,
Thanks for your reply.
What you did is 100% what I need -- I now have a data frame with the
relevant data and I can take up from there.
Regards

Lorenzo

On Sun, Oct 11, 2015 at 03:54:10PM -0400, jim holtman wrote:
>Not sure exactly what you want since you did not show an expected output,
>but this will extract the attributes from AccVal in the structure:
>
>> #####################################################################
>>  library(XML)
>>
>>  xmlfile=xmlParse("/temp/account.xml")
>>
>>  class(xmlfile) #"XMLInternalDocument" "XMLAbstractDocument"
>[1] "XMLInternalDocument" "XMLAbstractDocument"
>>  xmltop = xmlRoot(xmlfile) #gives content of root
>>
>>  #####  try this  ##############
>>
>>  accts <- sapply(getNodeSet(xmltop, "//AccVal"), xmlAttrs)
>>
>>  # create data.frame
>>  accts_df <- as.data.frame(t(accts), stringsAsFactors = FALSE)
>>  str(accts_df)
>'data.frame':   364 obs. of  4 variables:
> $ key        : chr  "AccountCode" "AccountReady" "AccountType"
>"AccruedCash" ...
> $ val        : chr  "DU108063" "true" "CORPORATION" "0" ...
> $ currency   : chr  "" "" "" "AUD" ...
> $ accountName: chr  "DU108063" "DU108063" "DU108063" "DU108063" ...
>>  head(accts_df)
>           key         val currency accountName
>1  AccountCode    DU108063             DU108063
>2 AccountReady        true             DU108063
>3  AccountType CORPORATION             DU108063
>4  AccruedCash           0      AUD    DU108063
>5  AccruedCash           0     BASE    DU108063
>6  AccruedCash           0      CAD    DU108063
>>
>
>
>Jim Holtman
>Data Munger Guru
>
>What is the problem that you are trying to solve?
>Tell me what you want to do, not how you want to do it.
>
>On Sun, Oct 11, 2015 at 3:10 PM, Lorenzo Isella <lorenzo.isella at gmail.com>
>wrote:
>
>> Dear All,
>> I am struggling with the parsing of the xml file you can find at
>>
>> https://www.dropbox.com/s/i4ld5qa26hwrhj7/account.xml?dl=0
>>
>> Essentially, I would like to be able to convert it to a data.frame to
>> manipulate it in R and detect all the attributes of an account for
>> which  unrealizedPNL goes above a threshold.
>> I stored that file as account.xml and looking here and there on the
>> web I put together the following script
>>
>>
>> #####################################################################
>> library(XML)
>>
>> xmlfile=xmlParse("account.xml")
>>
>> class(xmlfile) #"XMLInternalDocument" "XMLAbstractDocument"
>> xmltop = xmlRoot(xmlfile) #gives content of root
>> class(xmltop)#"XMLInternalElementNode" "XMLInternalNode"
>> "XMLAbstractNode"
>> xmlName(xmltop) #give name of node, PubmedArticleSet
>> xmlSize(xmltop) #how many children in node, 19
>> xmlName(xmltop[[1]]) #name of root's children
>>
>> # have a look at the content of the first child entry
>> xmltop[[1]]
>> # have a look at the content of the 2nd child entry
>> xmltop[[2]]
>> #Root Node's children
>> number <- xmlSize(xmltop[[1]]) #number of nodes in each child
>> name <- xmlSApply(xmltop[[1]], xmlName) #name(s)
>> attribute <- xmlSApply(xmltop[[1]], xmlAttrs) #attribute(s)
>> size <- xmlSApply(xmltop[[1]], xmlSize) #size
>>
>>
>> values <- xmlSApply(xmltop, function(x) xmlSApply(x, xmlValue))
>> #####################################################################
>>
>> which is leading me nowhere.
>> Any suggestion is appreciated.
>> Cheers
>>
>> Lorenzo
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>



More information about the R-help mailing list