[R] aggregate.zoo on bivariate data

Johannes Egner johannes.egner at gmail.com
Tue Aug 9 15:57:26 CEST 2011


On Mon, Aug 8, 2011 at 6:44 PM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
> On Mon, Aug 8, 2011 at 9:16 AM, Johannes Egner <johannes.egner at gmail.com> wrote:
>> Hi,
>>
>> I'm removing non-unique time indices in a zoo time series by means of
>> aggregate. The time series is bivariate, and the row to be kept only depends
>> on the maximum of one of the two columns. Here's an example:
>>
>> x <- zoo(rbind( c(1,1), c(1.1, 0.9), c(1.1, 1.1), c(1,1) ),
>>        order.by=c(1,1,2,2))
>>
>> The eventual aggregated result should be
>>
>> 1   1.1   0.9
>> 2   1.1   1.1
>>
>> that is, in each slice of the underlying data (a slice being all rows with
>> the same time stamp), we take the row that has maximum value in the first
>> column. (For the moment, let's not worry about several rows within the same
>> slice having the same maximum value in the first column.)
>>
>> I have tried subsetting x by
>>
>> slices <- aggregate(x[,1], by=identity, FUN=which.max)
>>
>> but ended up with something as ugly as:
>>
>> T <- length( unique(time(x)) )
>> result <- zoo( matrix(NA, ncol=2, nrow=T), order.by=unique(time(x)) )
>>
>> for(t in seq(length.out=T))
>> {
>>    result[t,] <- x[ time(x)==time(slices[t]) ][coredata(slices[t]),]
>>
>> }
>>
>> There must be a better way of doing this -- maybe using tapply or the plyr
>> package, but possibly something much simpler. Any pointers are very welcome.
>
> Where does the data come from in the first place?  Is it being read
> in?  or is it in a data frame that is converted to a zoo object?

We can assume the most convenient choice, really. Technically, I'm
reading three equi-sized vectors (timestamps, first column, second
column) from respective rdata-files, cbind the data together, and then
make them a zoo object by ordering with the timestamps. (Hence my
example, which mimics the situation.)

Incidentally, after some thought, I have found a neater (and much
faster) way. Each slice reports both its length and the position of
the maximum entry back via aggregate, and we then subset
appropriately:

#####################################
x <- zoo(rbind( c(1,1), c(1.1, 0.9), c(1.1, 1.1), c(1,1) ),
		order.by=c(1,1,2,2))

indices.prelim <- aggregate(x[, 1], by=identity, FUN=function(x)
c(which.max(x), length(x)))

cumShift <- cumsum( coredata(indices.prelim[,2]) )
cumShift <- c(0, cumShift[-length(cumShift)])
shift <- coredata(indices.prelim[,1])

indices <- shift+cumShift
result <- x[indices, ]
#####################################

Suggestions nonetheless welcome. And Gabor -- any way to turn off the
warning message for zoo objects when 'order.by' indices are not
unique?



More information about the R-help mailing list