[R] Irregular Time Series: zoo & its: Pros & Cons

Gabor Grothendieck ggrothendieck at gmail.com
Fri Aug 26 01:01:59 CEST 2005


On 8/25/05, David James <djames at frontierassoc.com> wrote:
> Hello,
> 
> I'm working with irregular time series data.  What do you all think
> about the strengths and weaknesses of the "zoo" and "its" packages?


I have worked on the development of zoo with Achim Zeileis so
I will just speak to that one.

The key to notice about zoo is its independence of index class
(i.e.  date, time or date/time class) making it general in
nature so that you can use any one you like.  It supports
all the standard date and time classes in R and you can add
your own too. In your case you probably want to use chron
(or POSIXct if you need time zones) or you could create your 
own special hourly class.  See the Help Desk article in 
R News 4/1 for a discussion of the main classes and
see the table at the end of that article for various idioms which 
you may need.

zoo supports not only irregular but also weakly regular
series (zooreg class) which are ones that have an underlying 
regularity, e.g. hourly, monthly even though they may not
have every hour, month, etc. filled in.

zoo has a PDF manual available via (in R):

   library(zoo)
   vignette("zoo")

zoo can work together with the 'its' class and 'ts' class via as.zoo, 
as.its and as.ts.

> 
> I've installed and skimmed the documentation on both packages.  I was
> hoping to get a little guidance from the user community before
> proceeding further.
> 
> In case anyone is interested in my particular problem:  I'm looking
> at some (surface) temperature data from NOAA:  http://
> cdo.ncdc.noaa.gov/ulcd/ULCD
> It is (irregular) time series format.  The NOAA data reports year,
> month, date, hour, and minute.  I want to group the data into hourly
> chunks.  However, sometimes there are multiple observation per hour
> -- i.e an observation at 3:45 and 3:56.  Also, sometimes a particular
> hour may be missing altogether.  I need to clean up the data so that
> each hour has one and only one data point.



Using the chron date/time class here is an example:

library(chron)
library(zoo)

set.seed(1)

# create zoo series with random dates/times between tt0 and tt1 
# also random values
set.seed(1)
n <- 25
tt0 <- chron("01/01/90")
tt1 <- chron("01/01/00")
tt <- sort(as.numeric(tt1-tt0)*runif(n)+tt0)
z <- zoo(rnorm(n), tt)  # create zoo series from values and date/times

# aggregate by hour choosing first data point if there are mulitples.
# The arguments are (1) the zoo series (2) time rounded to the hour
# (3) aggregate function to use -- indexing in this case, (4) an
# argument to the indexing function -- in this case its 1 since
# we want the first element.  See ?aggregate.zoo
z.hr <- aggregate(z, chron(floor(24*as.numeric(tt))/24), "[", 1)

# plot hourly series, see ?plot.zoo
plot(z.hr) 

Packages with explicit support for zoo are strucchange, dynlm
and dyn.  (dyn also supports ts and its.)

> 
> I'm relatively new to R, but I think I'm getting a hold on it pretty
> well so far.  I used to do a lot with MATLAB, and there seem to be

Check out 
   http://cran.r-project.org/doc/contrib/R-and-octave-2.txt

> many parallels between it and R.  I have background in public policy
> and econometrics.

Check out
   http://cran.r-project.org/src/contrib/Views/




More information about the R-help mailing list