[R] Multiple time series with zoo

Gabor Grothendieck ggrothendieck at gmail.com
Tue Nov 1 12:56:11 CET 2011


On Mon, Oct 31, 2011 at 11:27 PM, Vinny Moriarty <vwmoriarty at gmail.com> wrote:
> Thanks for everyone's input so far, it is greatly appreciated. But I've got
> one last task I could use some advice on
>
>
> Here are the first few lines of my data set:
>
> site,time_local,time_utc,reef_type_code,sensor_type,sensor_depth_m,temperature_c
> 06,2006-04-09 10:20:00,2006-04-09 20:20:00,BAK,sb39, 2, 29.63
> 06,2006-04-09 10:40:00,2006-04-09 20:40:00,BAK,sb39, 2, 29.56
> 06,2006-04-09 11:00:00,2006-04-09 21:00:00,BAK,sb39, 2, 29.51
> 06,2006-04-09 11:20:00,2006-04-09 21:20:00,BAK,sb39, 10, 29.53
> 06,2006-04-09 11:40:00,2006-04-09 21:40:00,BAK,sb39, 2, 29.57
> 06,2006-04-09 12:00:00,2006-04-09 22:00:00,BAK,sb39, 2, 29.60
> 06,2006-04-09 12:20:00,2006-04-09 22:20:00,BAK,sb39, 2, 29.66
> 06,2006-04-09 12:40:00,2006-04-09 22:40:00,BAK,sb39, 2, 29.68
> 06,2006-04-09 13:00:00,2006-04-09 23:00:00,BAK,sb39, 10, 29.68
> 06,2006-04-09 13:20:00,2006-04-09 23:20:00,BAK,sb39, 2, 29.71
> 06,2006-04-09 13:40:00,2006-04-09 23:40:00,BAK,sb39, 2, 29.68
> 06,2006-04-09 14:00:00,2006-04-10 00:00:00,BAK,sb39, 10, 29.49
> 06,2006-04-09 14:20:00,2006-04-10 00:20:00,BAK,sb39, 2, 29.31
> 06,2006-04-09 14:40:00,2006-04-10 00:40:00,BAK,sb39, 10, 29.27
>
> My goal was to extract all of the 10m data (all of the "10"'s from the
> "sensor_depth_m" column) and than calculate daily averages.
>
>
> With the help from this forum I came up with this:
>
> library(zoo)
> Data=read.table("06_BottomMountThermistors.csv",sep=",",header=TRUE,as.is
> =TRUE)
> Ten=subset(Data,sensor_depth_m==10L)
>  z=zoo(Ten$temperature_c,Ten$time_local)
> Warning message:
> In zoo(Ten$temperature_c, Ten$time_local) :
>  some methods for “zoo” objects do not work if the index entries in ‘
> order.by’ are not unique
>  ag=aggregate(zoo,as.Date,mean)
> write.csv(ag,file="LTER_6_10m.csv")
>
>
> Which works fine. I'm not sure why I get the error concerning unique
> entries as all of the 10m "time local" data is sequential and thus unique.
> Certainly some of the "temperature_c" data is repeated, but my
> understanding of a zoo object is that I have the "time_local" column set up
> as the order.by index.
> So any thoughts on the warning message and my understanding of zoo objects
> would be appreciated.
>
> But the last task I have for this data is to average several of these data
> sets together. My thoughts were to run the code as above for 6 different
> sites (column 1 is the "site" index). I still think in excel, so I was
> planning on lining up all 6 sites in a spreadsheet so that the dates (daily
> means from the above code) line up, and then just averaging the data like
> so:
>
>
> Date, Site1, Site2, Site3, Site4, Site5, Site6   Average
> 2006-04-09 ,20,19,20,19,14,12,average(Site1-6)
> 2006-04-10,12,13,14,15,16,12 ,average(Site1-6)
> 2006-04-11,12,12,12,13,12,12 , average(Site1-6)
> 2006-04-12,12,13,13,12,12,12, average(Site1-6)
>
>
> But I figure R can do this for me, so why bother going back to excel when R
> is turning out to be a way better way to work with this kind of data. I
> tried using merge, but I don't think this is the right command.
> So is there anyway I can have R  take 6 different data sets, line them up
> by date, and pull a grand average by day for all 6 sites?
>
>

The solution is basically only three lines: read.zoo, grep out the
required columns and append the average. A fourth statement to clean
up the column names would be optional.

1. The read.zoo command below reads all the files indicated by the
glob, ignoring the "NULL" columns (colClasses=...), splits each by
sensor_depth_m (split=2), takes only the Date part of the date/time
(FUN=as.Date) and aggregates (aggregate=mean) using mean over the rows
with the same Date.  It then arranges the result, z, with daily rows
that look like this so that there is one column for each sensor
depth/file combination:
> z
           2.data1.txt 10.data1.txt 2.data2.txt 10.data2.txt
2006-04-09      29.591      29.4925      29.591      29.4925

2. Then we reduce that to just the 10 columns using grep yielding z10

3. append an overall average which looks like this:
> z10
           10.data1.txt 10.data2.txt Average
2006-04-09      29.4925      29.4925 29.4925

4. It would be possible to add an optional fourth statement which
would be as follows in this sample case in order to get prettier
column names after creating z10.  The statement will vary depending on
the precise form of your file names so this line will work with the
sample here since we are using filenames of data1.txt and data2.txt
but you will may need to adjust it to your file names if they are
substantially different.  Here we look for column names that start
with 10 followed by a dot [.] and then some characters .* and then a
dot [.] and other characters .* and replace that with just the
parenthesized portion \\1:
colnames(z10) <- sub("10[.](.*)[.].*", "\\1", colnames(z10))


# generate test data
Lines <- "site,time_local,time_utc,reef_type_code,sensor_type,sensor_depth_m,temperature_c
06,2006-04-09 10:20:00,2006-04-09 20:20:00,BAK,sb39, 2, 29.63
06,2006-04-09 10:40:00,2006-04-09 20:40:00,BAK,sb39, 2, 29.56
06,2006-04-09 11:00:00,2006-04-09 21:00:00,BAK,sb39, 2, 29.51
06,2006-04-09 11:20:00,2006-04-09 21:20:00,BAK,sb39, 10, 29.53
06,2006-04-09 11:40:00,2006-04-09 21:40:00,BAK,sb39, 2, 29.57
06,2006-04-09 12:00:00,2006-04-09 22:00:00,BAK,sb39, 2, 29.60
06,2006-04-09 12:20:00,2006-04-09 22:20:00,BAK,sb39, 2, 29.66
06,2006-04-09 12:40:00,2006-04-09 22:40:00,BAK,sb39, 2, 29.68
06,2006-04-09 13:00:00,2006-04-09 23:00:00,BAK,sb39, 10, 29.68
06,2006-04-09 13:20:00,2006-04-09 23:20:00,BAK,sb39, 2, 29.71
06,2006-04-09 13:40:00,2006-04-09 23:40:00,BAK,sb39, 2, 29.68
06,2006-04-09 14:00:00,2006-04-10 00:00:00,BAK,sb39, 10, 29.49
06,2006-04-09 14:20:00,2006-04-10 00:20:00,BAK,sb39, 2, 29.31
06,2006-04-09 14:40:00,2006-04-10 00:40:00,BAK,sb39, 10, 29.27"
cat(Lines, "\n", file = "data1.txt")
cat(Lines, "\n", file = "data2.txt")

library(zoo)
z <- read.zoo(Sys.glob("data*.txt"), header = TRUE, sep = ",", split = 2,
	colClasses = c("NULL", NA, "NULL", "NULL", "NULL", NA, NA),
	FUN = as.Date, aggregate = mean)
z10 <- z[, grep("^10[.]", colnames(z))]
z10$Average <- rowMeans(z10)

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list