``the real'' time series package

Paul Gilbert pgilbert@bank-banque-canada.ca
Thu, 15 Jul 1999 15:38:44 -0400

>- I'd very much appreciate if Martyn and you and Paul G.
>  (and Brian and Ross and ...
>   and me, we also have a few improvements on S-plus laying around here)
>  could coordinate to produce ``the real'' time series package for R.

I'd like to split this into pieces:

- Some core functions for Splus compatibility. (acf and ar are the important
ones for me right now, but the spectrum stuff is important too.) Name space
conflicts will be a problem if there are two versions of, for example, acf,
which take different arguments (and  give different results - although bats acf
purposely gives different results from Splus). The objective of "bats" was to
fill this gap.

- Programming tools for operating on the time dimension of data. My tframe
library tries to do most of that. The idea is that other programs should be able
to use window, start, end, trim.na, etc., and not need to worry about whether
the time series is an old S tsp series, a "ts", an "rts", a "cts", a "tf", or
anything else that might come along. The library seems to work well and provides
an extremely useful building block. (I always appreciate constructive comments.)
This is also related to the question of what "[.ts" should do.

- A convenient method for pulling time series data from other sources into R
(and S and Omega, etc.). My PADI library does that. It works well, but is
getting a bit dated. I've been thinking about a CORBA replacement. (Any
volunteers to work on that?)

- A library of optimization methods which can be used for maximizing
likelihood(data, model(parameters)) would be useful. This is not specific to
time series and there is a fair amount of stuff around, so the problem is really
just making sure it fits the time series context. I have DFP and non-linear
simplex, but I haven't used them in a while.

- General structures for time series analysis. I am relatively committed to the
structure I have been using for several years now, but I would always consider
improvements. It should at least be considered as a good starting point. The
main elements are:
    1/  an abstract data representation with a distinction between input
(exogenous or conditioning) data and output (endogenous) data. These are x and y
in y ~ f(x), but it seems useful to put them in one structure because time
alignment is important. Then for example, window can be applied to the two
together rather than separately.  Currently my code supports an internal
implementation (input and output are time series matrices) and a PADI
implementation (the data is on a remote database running the PADI interface).
    2/  an abstract model representation which can be applied to the data.
Currently my code supports multivariate ARMA and state space models, Troll
models, and some crude attempts with NN models. But the idea is that it should
be easy to add other representations.
    3/  an object which combines the above two and adds some statistics
(likelihood, residuals, etc.)

- Methods for operating on the general structures: estimation techniques,
simulation, conversion, information criteria, etc. This is the fun part. I have
a fair amount of this, but it should be easy to add more.

- "Superstructure" methods for studying estimation techniques and the
forecasting properties of models. For example, one can take a "true model"
(defined as in 2/), simulate it (to get data as in 1/), and examine the small
sample properties of different estimation techniques. Or, take a given data set
and examine the (out-of-sample) forecasting properties of different models. The
idea is that these superstructure methods do not need to be changed if new data
representations or new model representations are added. (e.g. they do not need a
specific method for a new kind of model, they just call "simulate" and there
needs to be a simulate method for the new model.) My library has a fair amount
of this too,  but there is always room to add more here.

There are several areas in which I think my library is good, and there are
several deficiencies. I've played a bit with frequency domain methods, but that
is a general weakness. I'm very happy to see Adrian's unit root stuff, as it has
been on my "to do" list for several years now. I also get regular requests for
ARCH and GARCH modeling.

I do hope people will work on new stuff rather than things which are already
done by someone else. It might be useful to put a "time series" bullet on CRAN
(in contrib/PACKAGES.html probably) giving a general outline of what is where. I
get pretty upset when I see references to the lack of time series functionality
in R. (My DSE package is in the development area of CRAN only because the
documentation is in HTML rather than R's .Rd format and R's installation
procedures have been changing. The code is not unstable or incomplete.  The
version on CRAN installed with 0.63? and the most recent version at
<http://www.bank-banque-canada.ca/pgilbert> installs with 0.64 and 0.65)

Paul Gilbert

r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch