[R] The time series analysis functions/packages don't seem to like my data

Ted Byers r.ted.byers at gmail.com
Sat Jul 4 02:44:20 CEST 2009


Hi Gabor,  Thanks.

On Fri, Jul 3, 2009 at 8:25 PM, Gabor
Grothendieck<ggrothendieck at gmail.com> wrote:
> # 1. You can directly read your data into a zoo series like this:
>
> Lines <- "8190 2009-06-16 49.30
> 8191 2009-06-17 48.40
> 8192 2009-06-18 47.72
> 8193 2009-06-19 48.83
> 8194 2009-06-22 46.85
> 8195 2009-06-23 47.11
> 8196 2009-06-24 46.97
> 8197 2009-06-25 47.43"
>

OK.  Now I have to read up on zoo too.  I was going to get to that, as
I saw it mentioned in a couple views related to analyzing financial
data.

I apologize if this is a naive question, but if I am reading my data
successfully using:

thedata = read.csv("K:\\Work\\SignalTest\\BP.csv", sep = "\t", header
= FALSE, na.strings="")

can my "thedata" be used in the same way as your "Lines"?  Or would
that be a different function call?

What is your "Lines" anyway: a vector containing a series of strings?
a matrix of strings? one long string distributed over a series of
lines?

> library(zoo)
> z <- read.zoo(textConnection(Lines), index = 2)
>
> # and from that you can readily convert it to
> # other time series formats if need be.
>
> # 2. Read ?table.Drawdowns.  It asks for __returns__, not raw
> # data as input.
>
OOPS, so I'll need an extra step.  It is trivial to convert my data to
daily deltas.  I was more concerned at the moment with just getting my
time series data into a form the time series functions require.

Thank you.  This is quite useful.

Cheers

Ted

> library(PerformanceAnalytics)
> table.Drawdowns(diff(log(z$V3)))
>
> That gives me an error and looking into it it seems
> likely that table.Drawdowns fails when there is only one
> drawdown.
>
> library(help = PerformanceAnalytics)
>
> will give you the author's email address to whom you
> can report the problem.
>
> On Fri, Jul 3, 2009 at 7:34 PM, Ted Byers<r.ted.byers at gmail.com> wrote:
>> Hi Mark
>>
>> Thanks for replying.
>>
>> Here is a short snippet that reproduces the problem:
>>
>> library(PerformanceAnalytics)
>> thedata = read.csv("K:\\Work\\SignalTest\\BP.csv", sep = "\t", header
>> = FALSE, na.strings="")
>> thedata
>> x = as.timeseries(thedata)
>> x
>> table.Drawdowns(thedata,top = 10)
>> table.Drawdowns(thedata$V2, top = 10)
>>
>> The object 'thedata' has exactly what I expected. the line 'thedata'
>> prints the correct contents of the file with each row prepended by a
>> line number.  The last few lines are:
>>
>> 8191 2009-06-17 48.40
>> 8192 2009-06-18 47.72
>> 8193 2009-06-19 48.83
>> 8194 2009-06-22 46.85
>> 8195 2009-06-23 47.11
>> 8196 2009-06-24 46.97
>> 8197 2009-06-25 47.43
>>
>> The number of lines (8197), dates (and their format) and prices are correct.
>>
>> The last four lines produce the following output:
>>> x = as.timeseries(thedata)
>> Error: could not find function "as.timeseries"
>>> x
>> Error: object 'x' not found
>>> table.Drawdowns(thedata,top = 10)
>> Error in 1 + na.omit(x) : non-numeric argument to binary operator
>>> table.Drawdowns(thedata$V2, top = 10)
>> Error in if (thisSign == priorSign) { :
>>  missing value where TRUE/FALSE needed
>>>
>>
>> Are the functions in your example in Rmetrics or PerformanceAnalytics?
>> (like I said, I am just beginning this exploration, and I started with
>> table.Drawdowns because it produces information that I need first)
>> And given that my data is in tab delimited files, and can be read
>> using read.csv, how do I feed my data into your four statements?
>>
>> My guess is I am missing something in coercing my data in (the data
>> frame?) thedata into a timeseries array of the sort the time series
>> analysis functions need: and one of the things I find a bit confusing
>> is that some of the documentation for this mentions S3 classes and
>> some mentions S4 classes (I don't know if that means I have to make
>> multiple copies of my data to get the output I need).  I could coerce
>> thedata$V2 into a numeric vector, but I'd rather not separate the
>> prices from their dates unless that is necessary (how would one
>> produce monthly, annual or annualized rates of return if one did
>> that?).
>>
>> Thanks
>>
>> Ted
>>
>> On Fri, Jul 3, 2009 at 6:39 PM, Mark Knecht<markknecht at gmail.com> wrote:
>>> On Fri, Jul 3, 2009 at 2:48 PM, Ted Byers<r.ted.byers at gmail.com> wrote:
>>>> I have hundreds of megabytes of price data time series, and perl
>>>> scripts that extract it to tab delimited files (I have C++ programs
>>>> that must analyse this data too, so I get Perl to extract it rather
>>>> than have multiple connections to the DB).
>>>>
>>>> I can read the data into an R object without any problems.
>>>>
>>>> thedata = read.csv("K:\\Work\\SignalTest\\BP.csv", sep = "\t", header
>>>> = FALSE, na.strings="")
>>>> thedata
>>>>
>>>> The above statements give me precisely what I expect.  The last few
>>>> lines of output are:
>>>> 8190 2009-06-16 49.30
>>>> 8191 2009-06-17 48.40
>>>> 8192 2009-06-18 47.72
>>>> 8193 2009-06-19 48.83
>>>> 8194 2009-06-22 46.85
>>>> 8195 2009-06-23 47.11
>>>> 8196 2009-06-24 46.97
>>>> 8197 2009-06-25 47.43
>>>>
>>>> I have loaded Rmetrics and PerformanceAnalytics, among other packages.
>>>>  I tried as.timeseries, but R2.9.1 tells me there is no such function.
>>>> I tried as.ts(thedata), but that only replaces the date field by the
>>>> row label in 'thedata'.
>>>>
>>>> If I apply the performance analytics drawdowns function to either
>>>> thedata or thedate$V2, I get errors:
>>>>> table.Drawdowns(thedata,top = 10)
>>>> Error in 1 + na.omit(x) : non-numeric argument to binary operator
>>>>> table.Drawdowns(thedata$V2, top = 10)
>>>> Error in if (thisSign == priorSign) { :
>>>>  missing value where TRUE/FALSE needed
>>>>>
>>>>
>>>> thedata$V2 by itself does give me the price data from the file.
>>>>
>>>> I am a relative novice in using R for timeseries, so I wouldn't be
>>>> surprised it I missed something that would be obvious to someone more
>>>> practiced in using R, but I don't see what that could be from the
>>>> documentation of the functions I am looking at using.  I have no
>>>> shortage of data, and I don't want to write C++ code, or perl code, to
>>>> do all the kinds of calculations provided in, Rmetrics and
>>>> performanceanalytics, but getting my data into the functions these
>>>> packages provide is killing me!
>>>>
>>>> What did I miss?
>>>>
>>>> Thanks
>>>>
>>>> Ted
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>> Could you supply some portion of the results when you run the example
>>> on your data? The example goes like:
>>>
>>> data(edhec)
>>> R=edhec[,"Funds.of.Funds"]
>>> findDrawdowns(R)
>>> sortDrawdowns(findDrawdowns(R))
>>>
>>> How are you using the function with your data?
>>>
>>> - Mark
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>




More information about the R-help mailing list