[R] ZOO: Learning to apply it to my data

Gabor Grothendieck ggrothendieck at gmail.com
Wed Sep 14 01:22:13 CEST 2011


On Tue, Sep 13, 2011 at 2:07 PM, Rich Shepard <rshepard at appl-ecosys.com> wrote:
>  I have read ?zoo but am not sure how to relate the parameters (x,
> order.by, frequency, and style) to my data.frame. The structure of the
> data.frame is
>
> 'data.frame':   11169 obs. of  4 variables:
>  $ stream  : Factor w/ 37 levels "Burns","CIL",..: 1 1 1 1 1 1 1 1 1 1 ...
>  $ sampdate: Date, format: "1987-07-23" "1987-09-17" ...
>  $ param   : Factor w/ 8 levels "As","Ca","Cl",..: 1 1 1 1 1 1 1 1 1 1 ...
>  $ quant   : num  0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0 ...
>
>  The numeric column ('x' in zoo, I believe) is associated with the unique
> combination of param, sampdate, and stream in each row. For example:
>
> tail(streamdata)
>       stream   sampdate param   quant
> 11164 Winters 2010-06-30   SO4 120.000
> 11165 Winters 2010-06-30    Zn   0.010
> 11166 Winters 2011-06-06    As   0.005
> 11167 Winters 2011-06-06    Cl   5.000
> 11168 Winters 2011-06-06   SO4 150.000
> 11169 Winters 2011-06-06    Zn   0.010
>
>  I'm in the early exploratory stage of understanding these data, but want
> to produce time series plots and analyses by stream and param using zoo
> objects since the sampdate varies by both stream and chemical.
>
>  I assume that order.by, the index, is sampdate. The frequency option is
> FALSE because these samples are not temporally regular. I've no idea what to
> do with the style option, if anything.
>
>  Most of the examples I see on using R (including in the lattice book I'm
> now reading) have one or more numeric columns in the data.frame associated
> with a single factor. I have a single numeric column associated with two
> factors and a date.
>
>  If there are other documents or books I should read to learn how to
> effectively use the zoo package for my project (in addition to zoo.pdf that
> lists the methods and is quite obtuse to me), please point me to them. I
> would greatly appreciate any and all help in getting up to speed with zoo.
>

As in ?zoo a zoo object is a numeric matrix, numeric vector or factor
together with an ordered time index which is unique. Its not clear
that that is what you have; however, if we can assume that for each
value of param we have a unique set of dates then quant could form a
multivariate zoo series with Date index.  We used text=Lines in
read.zoo below to keep the example self-contained but in reality the
first argument to read.zoo would be something like "myfile.dat" to
refer to the file holding the data .    The "NULL" entries in the
colClasses argument of read.zoo cause the respective columns to be
ignored.

Lines <- "stream   sampdate param   quant
11164 Winters 2010-06-30   SO4 120.000
11165 Winters 2010-06-30    Zn   0.010
11166 Winters 2011-06-06    As   0.005
11167 Winters 2011-06-06    Cl   5.000
11168 Winters 2011-06-06   SO4 150.000
11169 Winters 2011-06-06    Zn   0.010"

library(zoo)
packageVersion("zoo") # should be >= 1.7-4

z <- read.zoo(text = Lines, skip = 1, split = 2,
  colClasses = c("NULL", "NULL", NA, NA, NA))

which gives

> z
              As Cl SO4   Zn
2010-06-30    NA NA 120 0.01
2011-06-06 0.005  5 150 0.01

Read over ?zoo and ?read.zoo and also the 5 vignettes.  The zoo-read
vignette is entirely about read.zoo .  If you really do want to keep
all that info you might want to use a data frame instead or possibly
several zoo objects.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list