[R] Create a Data Frame from an XML

Duncan Temple Lang dtemplelang at ucdavis.edu
Wed Jan 23 02:04:46 CET 2013


Hi Adam

 [You seem to have sent the same message twice to the mailing list.]

There are various strategies/approaches to creating the data frame
from the XML.

Perhaps the approach that most closely follows your approach is

  xmlRoot(doc)[ "row" ]

which  returns a list of XML nodes whose node name is "row" that are
children of the root node <data>.

So
  sapply(xmlRoot(doc) [ "row" ], xmlAttrs)

yields a matrix with as many columns as there are  <row> nodes
and with 3 rows - one for each of the BRAND, YEAR and VALUE attributes.

So

  d = t( sapply(xmlRoot(doc) [ "row" ], xmlAttrs) )

gives you a matrix with the correct rows and column orientation
and now you can turn that into a data frame, converting the
columns into numbers, etc. as you want with regular R commands
(i.e. independently of the XML).


 D.

On 1/22/13 1:43 PM, Adam Gabbert wrote:
>  Hello,
> 
> I'm attempting to read information from an XML into a data frame in R using
> the "XML" package. I am unable to get the data into a data frame as I would
> like.  I have some sample code below.
> 
> *XML Code:*
> 
> Header...
> 
> Data I want in a data frame:
> 
>    <data>
>   <row BRAND="GMC" NUM="1" YEAR="1999" VALUE="10000" />
>   <row BRAND="FORD" NUM="1" YEAR="2000" VALUE="12000" />
>   <row BRAND="GMC" NUM="1" YEAR="2001" VALUE="12500" />
>   <row BRAND="FORD" NUM="1" YEAR="2002" VALUE="13000" />
>   <row BRAND="GMC" NUM="1" YEAR="2003" VALUE="14000" />
>   <row BRAND="FORD" NUM="1" YEAR="2004" VALUE="17000" />
>   <row BRAND="GMC" NUM="1" YEAR="2005" VALUE="15000" />
>   <row BRAND="GMC" NUM="1" YEAR="1967" VALUE="PRICLESS" />
>   <row BRAND="FORD" NUM="1" YEAR="2007" VALUE="17500" />
>   <row BRAND="GMC" NUM="1" YEAR="2008" VALUE="22000" />
>   </data>
> 
> *R Code:*
> 
> doc< -xmlInternalTreeParse ("Sample2.xml")
> top <- xmlRoot (doc)
> xmlName (top)
> names (top)
> art <- top [["row"]]
> art
> **
> *Output:*
> 
>> art<row BRAND="GMC" NUM="1" YEAR="1999" VALUE="10000"/>
> 
> * *
> 
> 
> This is where I am having difficulties.  I am unable to "access" additional
> rows; ( i.e.  <row BRAND="GMC" NUM="1" YEAR="1967" VALUE="PRICLESS" /> )
> 
> and I am unable to access the individual entries to actually create the
> data frame.  The data frame I would like is as follows:
> 
> BRAND    NUM    YEAR    VALUE
> GMC        1          1999      10000
> FORD       2          2000      12000
> GMC        1          2001       12500
>     etc........
> 
> Any help or suggestions would be appreciated.  Conversly, my eventual goal
> would be to take a data frame and write it into an XML in the previously
> shown format.
> 
> Thank you
> 
> AG
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
>



More information about the R-help mailing list