[R] time series from timed data

Fri Mar 18 22:51:03 CET 2011

On Fri, Mar 18, 2011 at 2:19 PM, Sam Steingold <sds at gnu.org> wrote:
> Hi,
> I have data with multiple sub-second entries:
>
> 2011/03/15 09:32:15.035619,-0.403103,1.09664,48.6,126.92,117.32
> 2011/03/15 09:32:15.069331,-0.39851,1.09874,48.6,126.92,117.32
> 2011/03/15 09:32:15.289135,-0.402463,1.10084,48.59,126.92,117.32
> 2011/03/15 09:32:15.296110,-0.450244,1.10063,48.59,126.92,117.32
> 2011/03/15 09:32:15.451358,-0.438813,1.10273,48.59,126.93,117.32
> 2011/03/15 09:32:15.476948,-0.428701,1.10482,48.6,126.93,117.32
> 2011/03/15 09:32:15.490478,-0.407603,1.10692,48.6,126.93,117.32
> 2011/03/15 09:32:15.592054,-0.396562,1.10901,48.6,126.93,117.32
> 2011/03/15 09:32:15.592054,-0.388935,1.11111,48.6,126.93,117.32
> 2011/03/15 09:32:15.910516,-0.372265,1.11321,48.6,126.93,117.32
> 2011/03/15 09:32:15.910681,-0.336162,1.1153,48.6,126.93,117.32
> 2011/03/15 09:32:16.684729,-0.335838,1.1174,48.6,126.93,117.32
> 2011/03/15 09:32:17.196003,-0.293883,1.1195,48.6,126.93,117.32
> 2011/03/15 09:32:17.271841,-0.322763,1.11925,48.6,126.93,117.32
> 2011/03/15 09:32:17.271841,-0.329827,1.12134,48.6,126.93,117.32
> 2011/03/15 09:32:17.271842,-0.35095,1.12343,48.6,126.93,117.32
> 2011/03/15 09:32:17.331044,-0.33095,1.12552,48.6,126.93,117.32
> 2011/03/15 09:32:18.793662,-0.34787,1.12762,48.605,126.96,117.33
> 2011/03/15 09:32:18.806576,-0.10088,1.12735,48.62,126.96,117.33
> 2011/03/15 09:32:18.915319,-0.0577731,1.12708,48.62,126.96,117.33
> 2011/03/15 09:32:18.915321,-0.0417141,1.12917,48.62,126.96,117.33
> 2011/03/15 09:32:19.766950,-0.0420244,1.13125,48.62,126.95,117.33
> 2011/03/15 09:32:20.143307,-0.0184469,1.13333,48.62,126.95,117.33
>
> I can load it into a data.frame and convert the first column to POSIXt.
> now I want to convert it to a multi-time series by replacing multiple
> entries for each second with a single entry with mean value.
> (plus one more column with the count).
> e.g., the snippet above would turn into
>
> 2011/03/15 09:32:15,11,-0.4021,1.10545,48.6,126.926,117.32
> 2011/03/15 09:32:16,1,-0.335838,1.1174,48.6,126.93,117.32
> 2011/03/15 09:32:17,5,-0.3257,...
> 2011/03/15 09:32:18,4,...
> 2011/03/15 09:32:19,1,...
> 2011/03/15 09:32:20,1,...
>
> how do I do that?
>

1. Try this. You can ignore the warning message:

> Lines <- "2011/03/15 09:32:15.035619,-0.403103,1.09664,48.6,126.92,117.32
+ 2011/03/15 09:32:15.069331,-0.39851,1.09874,48.6,126.92,117.32
+ 2011/03/15 09:32:15.289135,-0.402463,1.10084,48.59,126.92,117.32
+ 2011/03/15 09:32:15.296110,-0.450244,1.10063,48.59,126.92,117.32
+ 2011/03/15 09:32:15.451358,-0.438813,1.10273,48.59,126.93,117.32
+ 2011/03/15 09:32:15.476948,-0.428701,1.10482,48.6,126.93,117.32
+ 2011/03/15 09:32:15.490478,-0.407603,1.10692,48.6,126.93,117.32
+ 2011/03/15 09:32:15.592054,-0.396562,1.10901,48.6,126.93,117.32
+ 2011/03/15 09:32:15.592054,-0.388935,1.11111,48.6,126.93,117.32
+ 2011/03/15 09:32:15.910516,-0.372265,1.11321,48.6,126.93,117.32
+ 2011/03/15 09:32:15.910681,-0.336162,1.1153,48.6,126.93,117.32
+ 2011/03/15 09:32:16.684729,-0.335838,1.1174,48.6,126.93,117.32
+ 2011/03/15 09:32:17.196003,-0.293883,1.1195,48.6,126.93,117.32
+ 2011/03/15 09:32:17.271841,-0.322763,1.11925,48.6,126.93,117.32
+ 2011/03/15 09:32:17.271841,-0.329827,1.12134,48.6,126.93,117.32
+ 2011/03/15 09:32:17.271842,-0.35095,1.12343,48.6,126.93,117.32
+ 2011/03/15 09:32:17.331044,-0.33095,1.12552,48.6,126.93,117.32
+ 2011/03/15 09:32:18.793662,-0.34787,1.12762,48.605,126.96,117.33
+ 2011/03/15 09:32:18.806576,-0.10088,1.12735,48.62,126.96,117.33
+ 2011/03/15 09:32:18.915319,-0.0577731,1.12708,48.62,126.96,117.33
+ 2011/03/15 09:32:18.915321,-0.0417141,1.12917,48.62,126.96,117.33
+ 2011/03/15 09:32:19.766950,-0.0420244,1.13125,48.62,126.95,117.33
+ 2011/03/15 09:32:20.143307,-0.0184469,1.13333,48.62,126.95,117.33"
>
> library(zoo)
> z <- read.zoo(textConnection(Lines), sep = ",", tz = "")
Warning message:
In zoo(rval3, ix) :
  some methods for “zoo” objects do not work if the index entries in
‘order.by’ are not unique
> secs <- as.POSIXct(trunc(time(z), "sec"))
> cbind(aggregate(z, secs, mean), count = aggregate(z, secs, length)[,1])
                            V2       V3       V4       V5     V6 count
2011-03-15 09:32:15 -0.4021237 1.105450 48.59727 126.9264 117.32    11
2011-03-15 09:32:16 -0.3358380 1.117400 48.60000 126.9300 117.32     1
2011-03-15 09:32:17 -0.3256746 1.121808 48.60000 126.9300 117.32     5
2011-03-15 09:32:18 -0.1370593 1.127805 48.61625 126.9600 117.33     4
2011-03-15 09:32:19 -0.0420244 1.131250 48.62000 126.9500 117.33     1
2011-03-15 09:32:20 -0.0184469 1.133330 48.62000 126.9500 117.33     1

2. An alternative that won't generate a warning message but involves a
double read is:

library(zoo)
toPOSIXct <- function(x) as.POSIXct(trunc(as.POSIXct(x), "sec"))
z <- read.zoo(textConnection(Lines), FUN = toPOSIXct, sep =
",",	aggregate = mean)
zl <- read.zoo(textConnection(Lines), FUN = toPOSIXct, sep = ",",
aggregate = length)
cbind(z, count = zl[,1])

In all cases replace textConnection(Lines) with the "myfile.csv" or whatever.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com