[R] How to extract following data

Gabor Grothendieck ggrothendieck at gmail.com
Wed Nov 5 12:22:40 CET 2008


As others have pointed out its close to XML but not quite
there; however, you could use strapply in gsubfn to extract
the data.  It pulls out the data matching the regular expression
giving vector, vec, consisting of: date price date price ...
Pulling out even and odd elements separately and
converting them to Date and numeric, respectively, gives the
resulting data.frame.

See
http://gsubfn.googlecode.com
for more on the gsubfn package and
the three zoo vignettes in the zoo package for more on it.

Lines <- '- <Temp diffgr:id="Temp14" msdata:rowOrder="13">
 <Date>2005-01-17T00:00:00+05:30</Date>
 <SecurityID>10149</SecurityID>
 <PriceClose>1288.40002</PriceClose>
 </Temp>
- <Temp diffgr:id="Temp15" msdata:rowOrder="14">
 <Date>2005-01-18T00:00:00+05:30</Date>
 <SecurityID>10149</SecurityID>
 <PriceClose>1291.69995</PriceClose>
 </Temp>
- <Temp diffgr:id="Temp16" msdata:rowOrder="15">
 <Date>2005-01-19T00:00:00+05:30</Date>
 <SecurityID>10149</SecurityID>
 <PriceClose>1288.19995</PriceClose>
 </Temp>'

library(gsubfn)
vec <- strapply(Lines, "....-..-..|[0-9]+[.][0-9]+")[[1]]
ix <- seq_along(vec) %% 2 == 1
DF <- data.frame(date = as.Date(vec[ix]), price = as.numeric(vec[!ix]))

# or, instead of the last line, you could convert it to a zoo object so
# that its in a more convenient form for time series manipulation:

library(zoo)
z <- zoo(as.numeric(vec[!ix]), as.Date(vec[ix]))



On Wed, Nov 5, 2008 at 1:22 AM, RON70 <ron_michael70 at yahoo.com> wrote:
>
> Hi everyone,
>
> I have this kind of raw dataset :
>
> - <Temp diffgr:id="Temp14" msdata:rowOrder="13">
>  <Date>2005-01-17T00:00:00+05:30</Date>
>  <SecurityID>10149</SecurityID>
>  <PriceClose>1288.40002</PriceClose>
>  </Temp>
> - <Temp diffgr:id="Temp15" msdata:rowOrder="14">
>  <Date>2005-01-18T00:00:00+05:30</Date>
>  <SecurityID>10149</SecurityID>
>  <PriceClose>1291.69995</PriceClose>
>  </Temp>
> - <Temp diffgr:id="Temp16" msdata:rowOrder="15">
>  <Date>2005-01-19T00:00:00+05:30</Date>
>  <SecurityID>10149</SecurityID>
>  <PriceClose>1288.19995</PriceClose>
>  </Temp>
>
> I was looking for some R procedure to extract data from this, that should be
> in following format :
>
> 2005-01-17 1288.40002
> 2005-01-18 1291.69995
> 2005-01-19 1288.19995
>
> Can R help me to do this?
>
> --
> View this message in context: http://www.nabble.com/How-to-extract-following-data-tp20336690p20336690.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list