[R] Classifying time series by shape over time

Andreas Neumann Andreas.Neumann at em.uni-karlsruhe.de
Tue Mar 21 17:08:48 CET 2006


Dear all,

I have hundreds of thousands of univariate time series of the form:
character "seriesid", vector of Date, vector of integer
(some exemplary data is at the end of the mail)

I am trying to find the ones which somehow "have a shape" over time that
looks like the histogramm of a (skewed) normal distribution:
>  hist(rnorm(200,10,2))
The "mean" is not interesting, i.e. it does not matter if the first
nonzero observation happens in the 2. or the 40. month of observation.
So all that matters is: They should start sometime, the hits per month
increase, at some point they decrease and then they more or less
disappear.

Short Example (hits at consecutive months (Dates omitted)):
1. series: 0 0 0 2 5 8 20 42 30 19 6 1 0 0 0                -> Good
2. series: 0 3 8 9 20 6 0 3 25 67 7 1 0 4 60 20 10 0 4      -> Bad

Series 1 would be an ideal case of what I am looking for.

Graphical inspection would be easy but is not an option due to the huge
amount of series.

Questions:

1. Which (if at all) of the many packages that handle time series is
appropriate for my problem?

2. Which general approach seems to be the most straightforward and best
supported by R?
- Is there a way to test the time series directly (preferably)?
- Or do I need to "type-cast" them as some kind of histogram
  data and then test against the pdf of e.g. a normal distribution (but
  how)?
- Or something totally different?


Thank you for your time,

     Andreas Neumann




Data Examples (id1 is good, id2 is bad):

> id1
        dates       hits
1  2004-12-01         3
2  2005-01-01         4
3  2005-02-01        10
4  2005-03-01         6
5  2005-04-01        35
6  2005-05-01        14
7  2005-06-01        33
8  2005-07-01        13
9  2005-08-01         3
10 2005-09-01         9
11 2005-10-01         8
12 2005-11-01         4
13 2005-12-01         3


> id2
        dates       hits
1  2001-01-01         6
2  2001-02-01         5
3  2001-03-01         5
4  2001-04-01         6
5  2001-05-01         2
6  2001-06-01         5
7  2001-07-01         1
8  2001-08-01         6
9  2001-09-01         4
10 2001-10-01        10
11 2001-11-01         0
12 2001-12-01         3
13 2002-01-01         6
14 2002-02-01         5
15 2002-03-01         1
16 2002-04-01         2
17 2002-05-01         4
18 2002-06-01         4
19 2002-07-01         0
20 2002-08-01         1
21 2002-09-01         0
22 2002-10-01         2
23 2002-11-01         2
24 2002-12-01         2
25 2003-01-01         2
26 2003-02-01         3
27 2003-03-01         7




More information about the R-help mailing list