[R] How to extract following data

Gabor Grothendieck ggrothendieck at gmail.com
Wed Nov 5 14:20:14 CET 2008


Just one comment.  The code posted works as shown
but if in your case Lines is actually composed of separate
lines rather than one big string as in my example then
you will need to add a simplify = c argument to
each strapply call.

On Wed, Nov 5, 2008 at 7:32 AM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
> Here is another solution made slightly shorter by using
> strapply twice:
>
> z <- zoo(strapply(Lines, "[0-9]+[.][0-9]+", as.numeric)[[1]],
>  strapply(Lines, "....-..-..", as.Date)[[1]])
>
> or to create a data frame:
>
> DF <- data.frame(date = strapply(Lines, "....-..-..", as.Date)[[1]],
>     price = strapply(Lines, "[0-9]+[.][0-9]+", as.numeric)[[1]])
>
> On Wed, Nov 5, 2008 at 6:22 AM, Gabor Grothendieck
> <ggrothendieck at gmail.com> wrote:
>> As others have pointed out its close to XML but not quite
>> there; however, you could use strapply in gsubfn to extract
>> the data.  It pulls out the data matching the regular expression
>> giving vector, vec, consisting of: date price date price ...
>> Pulling out even and odd elements separately and
>> converting them to Date and numeric, respectively, gives the
>> resulting data.frame.
>>
>> See
>> http://gsubfn.googlecode.com
>> for more on the gsubfn package and
>> the three zoo vignettes in the zoo package for more on it.
>>
>> Lines <- '- <Temp diffgr:id="Temp14" msdata:rowOrder="13">
>>  <Date>2005-01-17T00:00:00+05:30</Date>
>>  <SecurityID>10149</SecurityID>
>>  <PriceClose>1288.40002</PriceClose>
>>  </Temp>
>> - <Temp diffgr:id="Temp15" msdata:rowOrder="14">
>>  <Date>2005-01-18T00:00:00+05:30</Date>
>>  <SecurityID>10149</SecurityID>
>>  <PriceClose>1291.69995</PriceClose>
>>  </Temp>
>> - <Temp diffgr:id="Temp16" msdata:rowOrder="15">
>>  <Date>2005-01-19T00:00:00+05:30</Date>
>>  <SecurityID>10149</SecurityID>
>>  <PriceClose>1288.19995</PriceClose>
>>  </Temp>'
>>
>> library(gsubfn)
>> vec <- strapply(Lines, "....-..-..|[0-9]+[.][0-9]+")[[1]]
>> ix <- seq_along(vec) %% 2 == 1
>> DF <- data.frame(date = as.Date(vec[ix]), price = as.numeric(vec[!ix]))
>>
>> # or, instead of the last line, you could convert it to a zoo object so
>> # that its in a more convenient form for time series manipulation:
>>
>> library(zoo)
>> z <- zoo(as.numeric(vec[!ix]), as.Date(vec[ix]))
>>
>>
>>
>> On Wed, Nov 5, 2008 at 1:22 AM, RON70 <ron_michael70 at yahoo.com> wrote:
>>>
>>> Hi everyone,
>>>
>>> I have this kind of raw dataset :
>>>
>>> - <Temp diffgr:id="Temp14" msdata:rowOrder="13">
>>>  <Date>2005-01-17T00:00:00+05:30</Date>
>>>  <SecurityID>10149</SecurityID>
>>>  <PriceClose>1288.40002</PriceClose>
>>>  </Temp>
>>> - <Temp diffgr:id="Temp15" msdata:rowOrder="14">
>>>  <Date>2005-01-18T00:00:00+05:30</Date>
>>>  <SecurityID>10149</SecurityID>
>>>  <PriceClose>1291.69995</PriceClose>
>>>  </Temp>
>>> - <Temp diffgr:id="Temp16" msdata:rowOrder="15">
>>>  <Date>2005-01-19T00:00:00+05:30</Date>
>>>  <SecurityID>10149</SecurityID>
>>>  <PriceClose>1288.19995</PriceClose>
>>>  </Temp>
>>>
>>> I was looking for some R procedure to extract data from this, that should be
>>> in following format :
>>>
>>> 2005-01-17 1288.40002
>>> 2005-01-18 1291.69995
>>> 2005-01-19 1288.19995
>>>
>>> Can R help me to do this?
>>>
>>> --
>>> View this message in context: http://www.nabble.com/How-to-extract-following-data-tp20336690p20336690.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>



More information about the R-help mailing list