[R] reading time series csv file with read.zoo issues, then align time stamps

Sun Jun 15 20:39:25 CEST 2014

Goal: get time series data interpolated on to desired time stamps.
I have two or more data sets that have time stamps that vary from 5 mins to
3-5 hours.
I want to get all the data put on common time stamps e.g. "00:05:00"
intervals.

I asked Gabor and got some very good code ( zoo aggregate, na.spline,
na.approx) but I'm having trouble getting the csv file read in and converted
to a zoo object so I can try getting these functions going again.  Here is
what Gabor sent last time.

_____________________start of what Gabor sent ______________________
If you are using zoo then the zoo FAQ discusses grids
   http://cran.r-project.org/web/packages/zoo/index.html
and the other 4 vignettes (pdf documents) and reference manual on that
page discuss more.

zoo does not supply its own time classes except where classes are
elsewhere missing.   Its design is completely independent of the time
class and it works with any time class that supports certain methods
(and that includes all popular ones).  See R News 4/1 for more on date
and time classes.

Here is some code:

Lines <- "10/11/2011 23:30:01     432.22
10/11/2011 23:31:17     432.32
10/11/2011 23:35:00     432.32
10/11/2011 23:36:18     432.22
10/11/2011 23:37:18     432.72
10/11/2011 23:39:19     432.23
10/11/2011 23:40:02     432.23
10/11/2011 23:45:00     432.23
10/11/2011 23:45:20     429.75
10/11/2011 23:46:20     429.65
10/11/2011 23:50:00     429.65
10/11/2011 23:51:22     429.75
10/11/2011 23:55:01     429.75
10/11/2011 23:56:23     429.55
10/12/2011 0:00:07      429.55
10/12/2011 0:01:24      429.95
10/12/2011 0:05:00      429.95
10/12/2011 0:06:25      429.85
10/12/2011 0:10:00      429.85
10/12/2011 0:11:26      428.85
10/12/2011 0:15:00      428.85
10/12/2011 0:20:03      428.85
10/12/2011 0:21:29      428.75
10/12/2011 0:25:01      428.75
10/12/2011 0:30:01      428.75
10/12/2011 0:31:31      428.75"

library(zoo)
library(chron)

fmt <- "%m/%d/%Y %H:%M:%S"
toChron <- function(d, t) as.chron(paste(d, t), format = fmt)

z <- read.zoo(text = Lines, index = 1:2, FUN = toChron)

# 5 minute aggregates
m5 <- times("00:05:00")
ag5 <- aggregate(z, trunc(time(z), m5), mean)

# 5 minute spline fit
g <- seq(trunc(start(z), m5), end(z), by = m5)
na.spline(z, xout = g)

# 5 minute linear approx
na.approx(z, xout = g)
________________end of what Gabor sent_________________

My csv data looks like this.....when I look at the file with NotePad++ I see
the commas.

TimeStamp	Sea_Temperature_F
12/31/2011 13:24:00	52
12/31/2011 16:44:06	52
12/31/2011 20:44:06	53
01/01/2012 00:44:06	53
01/01/2012 04:44:06	53
01/01/2012 08:44:07	54
01/01/2012 12:26:00	54
01/01/2012 12:44:07	53
01/01/2012 16:44:07	53
01/01/2012 20:44:06	54
01/02/2012 00:44:09	54
01/02/2012 04:44:06	55
01/02/2012 08:44:07	55
01/02/2012 12:44:06	56
01/02/2012 13:04:00	56
01/02/2012 16:44:07	57
01/02/2012 20:44:07	58
01/03/2012 00:44:07	58
01/03/2012 04:44:06	59
01/03/2012 08:44:06	59
01/03/2012 10:48:00	59
01/03/2012 12:44:06	58
01/03/2012 16:44:06	58
01/03/2012 20:44:07	59
01/04/2012 00:44:06	59
01/04/2012 04:44:07	58
01/04/2012 08:44:07	58
01/04/2012 12:44:07	57
01/04/2012 15:30:00	57
01/04/2012 16:44:07	57
01/04/2012 20:44:06	57
01/05/2012 00:44:06	57

The R code I'm trying to get working is as follows: (I'm trying to follow
code provided by Gabor) but I'm too embarrassed to ask him directly again.

fmt <- "%M/%D/%Y %H:%M:%S"
toChron <- function(d, t) as.chron(paste(d, t), format = fmt)
seatemp <- read.zoo ("SampleSeaTempData-2.csv", sep=",", header=TRUE,
FUN=toChron)

I get errors:

> fmt <- "%M/%D/%Y %H:%M:%S"
> toChron <- function(d, t) as.chron(paste(d, t), format = fmt)
> seatemp <- read.zoo ("SampleSeaTempData-2.csv", sep=",", header=TRUE,
> FUN=toChron)
Error in paste(d, t) : argument "t" is missing, with no default
> 

If I take the "FUN=toChron" out I get this error. There are 542 rows of
data.

> seatemp <- read.zoo ("SampleSeaTempData-2.csv", sep=",", header=TRUE)
Error in read.zoo("SampleSeaTempData-2.csv", sep = ",", header = TRUE) : 
  index has 542 bad entries at data rows: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
90 91 92 93 94 95 96 97 98 99 100 ...
> 

I guess there is too much going on that I don't understand:
- what does the toChron line do?  how are "d" and "t" defined?
- why does the Gabor read.zoo line have "index=1:2" ?
- why does the Gabor code have " FUN=toChron"  ?

The idea is to get two or more data streams "converted" to exact timestamp
csv files with interpolated values and then I guess cbind the data into one
data frame so I can plot together.

I've read re. zoo csv file read issues/posts - e.g. getting the seconds
(":00") to appear in the csv file to eliminate duplicate row index entries.

Maybe it would be easier/cleaner to read the csv file into a regular R
dataframe and then "convert" to a zoo object?

In my analysis and plotting I use POSIXlt for time.

Help appreciated.  Thanks.

--
View this message in context: http://r.789695.n4.nabble.com/reading-time-series-csv-file-with-read-zoo-issues-then-align-time-stamps-tp4692157.html
Sent from the R help mailing list archive at Nabble.com.