[R] How to extract following data

Dieter Menne dieter.menne at menne-biomed.de
Wed Nov 5 08:36:02 CET 2008

RON70 <ron_michael70 <at> yahoo.com> writes:

> - <Temp diffgr:id="Temp14" msdata:rowOrder="13">
>   <Date>2005-01-17T00:00:00+05:30</Date> 
>   <SecurityID>10149</SecurityID> 
>   <PriceClose>1288.40002</PriceClose> 
>   </Temp>

Looks suspiciously like XML, and let's hope the real data are more like this
below, without the "-" and with a nice header

<?xml version="1.0" encoding="utf-8"?>
<Temp diffgr:id="Temp14" msdata:rowOrder="13">
<Temp diffgr:id="Temp15" msdata:rowOrder="14">
<Temp diffgr:id="Temp16" msdata:rowOrder="15">

The following code should give you a starter; some massaging of the Dates
required. There are warnings because of the missing prefixes diffgr and msdata.
For a first attempt, you can ignore these, but better get the full data set.

doc = xmlInternalTreeParse("temp.xml")
Date = sapply(getNodeSet(doc, "//Date"), xmlValue)
SecurityID = as.integer(sapply(getNodeSet(doc, "//SecurityID"), xmlValue))


More information about the R-help mailing list