[R] Create a Data Frame from an XML

Franzini, Gabriele [Nervianoms] Gabriele.Franzini at nervianoms.com
Thu Jan 24 14:40:35 CET 2013


Hello Adam,
I had a similar problem with a big dataframe, and building an xmlTree in
the clean way was extremely slow; so i resorted to manual method. Not
tested, but if your dataframe is my_df, then something like the
following should do:

buildEntry <- function(x) {
    cat(paste('<z:row BRAND="', x[1],
                     '" NUM="', x[2],
                     '" YEAR="', x[3],
                     '" VALUE="', x[4],
                     '"/>\n', sep=""))
    }

sink(paste('my_file.xml', sep=""))

cat ('<?xml version="1.0" encoding="ISO-8859-1"?>\n')
cat ('<data xmlns="uri://localhost/z">\n')
 
# invisible avoids returning a NULL in the file
invisible(apply(my_df, 1, buildEntry))
cat ("</data>" )
sink()

And it took very little time.

HTH,
Gabriele


-----Original Message-----
From: Adam Gabbert [mailto:adamjgabbert at gmail.com] 
Sent: Wednesday, January 23, 2013 5:36 PM
To: Duncan Temple Lang; btupper at bigelow.org
Cc: r-help at r-project.org
Subject: Re: [R] Create a Data Frame from an XML

Hello Gentlemen,

I mistakenly sent the message twice, because the first time I didn't
receive a notification message so I was unsure if it went through
properly.

Your solutions worked great. Thank you!  I felt like I was fairly close
just couldn't quite get the final step.

Now, I'm trying to reverse the process and account for my header.

In other words I have my data frame in R:

BRAND    NUM    YEAR    VALUE
GMC        1          1999      10000
FORD       2          2000      12000
GMC        1          2001       12500
     etc........
and I make some edits.
BRAND    NUM    YEAR    VALUE
DODGE       3          1999      10000
TOYOTA       4         2000      12000
DODGE        3          2001       12500
So now I would need to ouput an XML file in the same format accounting
for my header (essentially, add "z:" in front of row).

(What I want to output)
>   <data>
>   <z:row BRAND="DODGE" NUM="3" YEAR="1999" VALUE="10000" />
>   <z:row BRAND="TOYOTA" NUM="4" YEAR="2000" VALUE="12000" />
>   <z:row BRAND="DODGE" NUM="3" YEAR="2001" VALUE="12500" />
>   <z:row BRAND="TOYOTA" NUM="4" YEAR="2002" VALUE="13000" />
>   <z:row BRAND="DODGE" NUM="3" YEAR="2003" VALUE="14000" />
>   <z:row BRAND="TOYOTA" NUM="4" YEAR="2004" VALUE="17000" />
>   <z:row BRAND="DODGE" NUM="3" YEAR="2005" VALUE="15000" />
>   <z:row BRAND="DODGE" NUM="3" YEAR="1967" VALUE="PRICELESS" />
>   <z:row BRAND="TOYOTA" NUM="4" YEAR="2007" VALUE="17500" />
>   <z:row BRAND="DODGE" NUM="3" YEAR="2008" VALUE="22000" />
>   </data>
Thus far from the help I've found online I was trying to set up an
xmlTree xml <- xmlTree()

and use xml$addTag to create nodes and put in the data from my data
frame.
I feel like I'm not really even close to a solution so I'm starting to
believe that this might not be the best path to go down.

Once again, any help is much appreciated.

AG


On Tue, Jan 22, 2013 at 6:04 PM, Duncan Temple Lang
<dtemplelang at ucdavis.edu
> wrote:

>
> Hi Adam
>
>  [You seem to have sent the same message twice to the mailing list.]
>
> There are various strategies/approaches to creating the data frame 
> from the XML.
>
> Perhaps the approach that most closely follows your approach is
>
>   xmlRoot(doc)[ "row" ]
>
> which  returns a list of XML nodes whose node name is "row" that are 
> children of the root node <data>.
>
> So
>   sapply(xmlRoot(doc) [ "row" ], xmlAttrs)
>
> yields a matrix with as many columns as there are  <row> nodes and 
> with 3 rows - one for each of the BRAND, YEAR and VALUE attributes.
>
> So
>
>   d = t( sapply(xmlRoot(doc) [ "row" ], xmlAttrs) )
>
> gives you a matrix with the correct rows and column orientation and 
> now you can turn that into a data frame, converting the columns into 
> numbers, etc. as you want with regular R commands (i.e. independently 
> of the XML).
>
>
>  D.
>
> On 1/22/13 1:43 PM, Adam Gabbert wrote:
> >  Hello,
> >
> > I'm attempting to read information from an XML into a data frame in 
> > R
> using
> > the "XML" package. I am unable to get the data into a data frame as 
> > I
> would
> > like.  I have some sample code below.
> >
> > *XML Code:*
> >
> > Header...
> >
> > Data I want in a data frame:
> >
> >    <data>
> >   <row BRAND="GMC" NUM="1" YEAR="1999" VALUE="10000" />
> >   <row BRAND="FORD" NUM="1" YEAR="2000" VALUE="12000" />
> >   <row BRAND="GMC" NUM="1" YEAR="2001" VALUE="12500" />
> >   <row BRAND="FORD" NUM="1" YEAR="2002" VALUE="13000" />
> >   <row BRAND="GMC" NUM="1" YEAR="2003" VALUE="14000" />
> >   <row BRAND="FORD" NUM="1" YEAR="2004" VALUE="17000" />
> >   <row BRAND="GMC" NUM="1" YEAR="2005" VALUE="15000" />
> >   <row BRAND="GMC" NUM="1" YEAR="1967" VALUE="PRICLESS" />
> >   <row BRAND="FORD" NUM="1" YEAR="2007" VALUE="17500" />
> >   <row BRAND="GMC" NUM="1" YEAR="2008" VALUE="22000" />
> >   </data>
> >
> > *R Code:*
> >
> > doc< -xmlInternalTreeParse ("Sample2.xml") top <- xmlRoot (doc) 
> > xmlName (top) names (top) art <- top [["row"]] art
> > **
> > *Output:*
> >
> >> art<row BRAND="GMC" NUM="1" YEAR="1999" VALUE="10000"/>
> >
> > * *
> >
> >
> > This is where I am having difficulties.  I am unable to "access"
> additional
> > rows; ( i.e.  <row BRAND="GMC" NUM="1" YEAR="1967" VALUE="PRICLESS" 
> > /> )
> >
> > and I am unable to access the individual entries to actually create 
> > the data frame.  The data frame I would like is as follows:
> >
> > BRAND    NUM    YEAR    VALUE
> > GMC        1          1999      10000
> > FORD       2          2000      12000
> > GMC        1          2001       12500
> >     etc........
> >
> > Any help or suggestions would be appreciated.  Conversly, my 
> > eventual
> goal
> > would be to take a data frame and write it into an XML in the 
> > previously shown format.
> >
> > Thank you
> >
> > AG
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html<http://www.r-project.org/p
> osting-guide.html>
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html<http://www.r-project.org/p
> osting-guide.html> and provide commented, minimal, self-contained, 
> reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list