[R] aggregate.zoo on bivariate data

Gabor Grothendieck ggrothendieck at gmail.com
Tue Aug 9 16:10:19 CEST 2011


On Tue, Aug 9, 2011 at 9:57 AM, Johannes Egner <johannes.egner at gmail.com> wrote:
> On Mon, Aug 8, 2011 at 6:44 PM, Gabor Grothendieck
> <ggrothendieck at gmail.com> wrote:
>> On Mon, Aug 8, 2011 at 9:16 AM, Johannes Egner <johannes.egner at gmail.com> wrote:
>>> Hi,
>>>
>>> I'm removing non-unique time indices in a zoo time series by means of
>>> aggregate. The time series is bivariate, and the row to be kept only depends
>>> on the maximum of one of the two columns. Here's an example:
>>>
>>> x <- zoo(rbind( c(1,1), c(1.1, 0.9), c(1.1, 1.1), c(1,1) ),
>>>        order.by=c(1,1,2,2))
>>>
>>> The eventual aggregated result should be
>>>
>>> 1   1.1   0.9
>>> 2   1.1   1.1
>>>
>>> that is, in each slice of the underlying data (a slice being all rows with
>>> the same time stamp), we take the row that has maximum value in the first
>>> column. (For the moment, let's not worry about several rows within the same
>>> slice having the same maximum value in the first column.)
>>>
>>> I have tried subsetting x by
>>>
>>> slices <- aggregate(x[,1], by=identity, FUN=which.max)
>>>
>>> but ended up with something as ugly as:
>>>
>>> T <- length( unique(time(x)) )
>>> result <- zoo( matrix(NA, ncol=2, nrow=T), order.by=unique(time(x)) )
>>>
>>> for(t in seq(length.out=T))
>>> {
>>>    result[t,] <- x[ time(x)==time(slices[t]) ][coredata(slices[t]),]
>>>
>>> }
>>>
>>> There must be a better way of doing this -- maybe using tapply or the plyr
>>> package, but possibly something much simpler. Any pointers are very welcome.
>>
>> Where does the data come from in the first place?  Is it being read
>> in?  or is it in a data frame that is converted to a zoo object?
>
> We can assume the most convenient choice, really. Technically, I'm
> reading three equi-sized vectors (timestamps, first column, second
> column) from respective rdata-files, cbind the data together, and then
> make them a zoo object by ordering with the timestamps. (Hence my
> example, which mimics the situation.)
>

The reason I ask is that this is usually done when importing the data
into zoo (rather than importing the data with duplicates and then
removing them later).  In this case suppose we start with DF shown
below (in terms of your x object).   Then the following read.zoo
performs the required import and the aggregate all at once:

DF <- data.frame(time = time(x), coredata(x))
z <- read.zoo(DF[order(DF$time, DF$X1), ], aggregate = function(x) tail(x, 1))

Regarding suppressing the warnings on duplicates as long as the
duplicates are removed at the time of import its not an issue since
the situation leading to such warnings would never arise.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list