[R] The time series analysis functions/packages don't seem to like my data

Mark Knecht markknecht at gmail.com
Sat Jul 4 02:40:28 CEST 2009


On Fri, Jul 3, 2009 at 4:34 PM, Ted Byers<r.ted.byers at gmail.com> wrote:
> Hi Mark
>
> Thanks for replying.
>
> Here is a short snippet that reproduces the problem:
>
> library(PerformanceAnalytics)
> thedata = read.csv("K:\\Work\\SignalTest\\BP.csv", sep = "\t", header
> = FALSE, na.strings="")
> thedata
> x = as.timeseries(thedata)
> x
> table.Drawdowns(thedata,top = 10)
> table.Drawdowns(thedata$V2, top = 10)
>
> The object 'thedata' has exactly what I expected. the line 'thedata'
> prints the correct contents of the file with each row prepended by a
> line number.  The last few lines are:
>
> 8191 2009-06-17 48.40
> 8192 2009-06-18 47.72
> 8193 2009-06-19 48.83
> 8194 2009-06-22 46.85
> 8195 2009-06-23 47.11
> 8196 2009-06-24 46.97
> 8197 2009-06-25 47.43
>
> The number of lines (8197), dates (and their format) and prices are correct.
>
> The last four lines produce the following output:
>> x = as.timeseries(thedata)
> Error: could not find function "as.timeseries"
>> x
> Error: object 'x' not found
>> table.Drawdowns(thedata,top = 10)
> Error in 1 + na.omit(x) : non-numeric argument to binary operator
>> table.Drawdowns(thedata$V2, top = 10)
> Error in if (thisSign == priorSign) { :
>  missing value where TRUE/FALSE needed
>>
>
> Are the functions in your example in Rmetrics or PerformanceAnalytics?
> (like I said, I am just beginning this exploration, and I started with
> table.Drawdowns because it produces information that I need first)
> And given that my data is in tab delimited files, and can be read
> using read.csv, how do I feed my data into your four statements?
>
> My guess is I am missing something in coercing my data in (the data
> frame?) thedata into a timeseries array of the sort the time series
> analysis functions need: and one of the things I find a bit confusing
> is that some of the documentation for this mentions S3 classes and
> some mentions S4 classes (I don't know if that means I have to make
> multiple copies of my data to get the output I need).  I could coerce
> thedata$V2 into a numeric vector, but I'd rather not separate the
> prices from their dates unless that is necessary (how would one
> produce monthly, annual or annualized rates of return if one did
> that?).
>
> Thanks
>
> Ted
>
> On Fri, Jul 3, 2009 at 6:39 PM, Mark Knecht<markknecht at gmail.com> wrote:
>> On Fri, Jul 3, 2009 at 2:48 PM, Ted Byers<r.ted.byers at gmail.com> wrote:
>>> I have hundreds of megabytes of price data time series, and perl
>>> scripts that extract it to tab delimited files (I have C++ programs
>>> that must analyse this data too, so I get Perl to extract it rather
>>> than have multiple connections to the DB).
>>>
>>> I can read the data into an R object without any problems.
>>>
>>> thedata = read.csv("K:\\Work\\SignalTest\\BP.csv", sep = "\t", header
>>> = FALSE, na.strings="")
>>> thedata
>>>
>>> The above statements give me precisely what I expect.  The last few
>>> lines of output are:
>>> 8190 2009-06-16 49.30
>>> 8191 2009-06-17 48.40
>>> 8192 2009-06-18 47.72
>>> 8193 2009-06-19 48.83
>>> 8194 2009-06-22 46.85
>>> 8195 2009-06-23 47.11
>>> 8196 2009-06-24 46.97
>>> 8197 2009-06-25 47.43
>>>
>>> I have loaded Rmetrics and PerformanceAnalytics, among other packages.
>>>  I tried as.timeseries, but R2.9.1 tells me there is no such function.
>>> I tried as.ts(thedata), but that only replaces the date field by the
>>> row label in 'thedata'.
>>>
>>> If I apply the performance analytics drawdowns function to either
>>> thedata or thedate$V2, I get errors:
>>>> table.Drawdowns(thedata,top = 10)
>>> Error in 1 + na.omit(x) : non-numeric argument to binary operator
>>>> table.Drawdowns(thedata$V2, top = 10)
>>> Error in if (thisSign == priorSign) { :
>>>  missing value where TRUE/FALSE needed
>>>>
>>>
>>> thedata$V2 by itself does give me the price data from the file.
>>>
>>> I am a relative novice in using R for timeseries, so I wouldn't be
>>> surprised it I missed something that would be obvious to someone more
>>> practiced in using R, but I don't see what that could be from the
>>> documentation of the functions I am looking at using.  I have no
>>> shortage of data, and I don't want to write C++ code, or perl code, to
>>> do all the kinds of calculations provided in, Rmetrics and
>>> performanceanalytics, but getting my data into the functions these
>>> packages provide is killing me!
>>>
>>> What did I miss?
>>>
>>> Thanks
>>>
>>> Ted
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> Could you supply some portion of the results when you run the example
>> on your data? The example goes like:
>>
>> data(edhec)
>> R=edhec[,"Funds.of.Funds"]
>> findDrawdowns(R)
>> sortDrawdowns(findDrawdowns(R))
>>
>> How are you using the function with your data?
>>
>> - Mark
>>
>

Sorry, findDrawdowns is part of PerformanceAnalytics. I've added that
to this code so you can just copy it and run it all.

require(PerformanceAnalytics)
data(edhec)
class(edhec)
R=edhec[,"Funds.of.Funds"]
class(R)
findDrawdowns(R)
sortDrawdowns(findDrawdowns(R))

This is a subject that interests me. My data is read in using read.csv
which I think may provide similar problems should I ever want to use
this so I'm interested in how I solve it. I'm a newbie so be VERY
careful about anything I say!

What I see is that edhec is of class zoo, as is R. Here's what I did to check:

> require(PerformanceAnalytics)
> data(edhec)
> class(edhec)
[1] "zoo"
> R=edhec[,"Funds.of.Funds"]



> class(R)
[1] "zoo"
> findDrawdowns(R)
<SNIP>

> sortDrawdowns(findDrawdowns(R))
<SNIP>


Note that R, class zoo, has dates as the names and then a single column of data:

>
> R
Jan 1997 Feb 1997 Mar 1997 Apr 1997 May 1997 Jun 1997 Jul 1997 Aug
1997 Sep 1997 Oct 1997 Nov 1997 Dec 1997 Jan 1998 Feb 1998 Mar 1998
Apr 1998
  0.0317   0.0106  -0.0077   0.0009   0.0275   0.0225   0.0435
0.0051   0.0334  -0.0099  -0.0034   0.0089  -0.0036   0.0256   0.0373
 0.0125
May 1998 Jun 1998 Jul 1998 Aug 1998 Sep 1998 Oct 1998 Nov 1998 Dec
1998 Jan 1999 Feb 1999 Mar 1999 Apr 1999 May 1999 Jun 1999 Jul 1999
Aug 1999
 -0.0072   0.0021  -0.0007  -0.0616  -0.0037  -0.0002   0.0220
0.0222   0.0202  -0.0063   0.0213   0.0400   0.0119   0.0282   0.0088
 0.0028
<SNIP>
>

> names(R)
  [1] "1997-01-31" "1997-02-28" "1997-03-31" "1997-04-30" "1997-05-31"
"1997-06-30" "1997-07-31" "1997-08-31" "1997-09-30" "1997-10-31"
 [11] "1997-11-30" "1997-12-31" "1998-01-31" "1998-02-28" "1998-03-31"
"1998-04-30" "1998-05-31" "1998-06-30" "1998-07-31" "1998-08-31"
 [21] "1998-09-30" "1998-10-31" "1998-11-30" "1998-12-31" "1999-01-31"
"1999-02-28" "1999-03-31" "1999-04-30" "1999-05-31" "1999-06-30"
<SNIP>

> as.matrix(R)
                 R
1997-01-31  0.0317
1997-02-28  0.0106
1997-03-31 -0.0077
1997-04-30  0.0009
1997-05-31  0.0275
1997-06-30  0.0225
1997-07-31  0.0435
1997-08-31  0.0051
1997-09-30  0.0334
1997-10-31 -0.0099
1997-11-30 -0.0034
1997-12-31  0.0089
1998-01-31 -0.0036
1998-02-28  0.0256
1998-03-31  0.0373
1998-04-30  0.0125

<SNIP>

So the question, as of yet unanswered by me, is how to coerce the data
into that format. If we can then we will see if it works with dollar
data as opposed to the fractional stuff in the edhec file.

I'll be looking at this but don't expect much. I'm out in the deep end
at this point.

Cheers,
Mark




More information about the R-help mailing list