[R] templated use of aggregate

David Winsemius dwinsemius at comcast.net
Wed Jun 13 16:18:22 CEST 2012


On Jun 13, 2012, at 10:09 AM, Matthew Johnson wrote:

> my sessioninfo was as follows:
>
>> sessionInfo()
> R version 2.14.1 (2011-12-22)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods
> [7] base
>
> other attached packages:
> [1] xts_0.8-2 zoo_1.7-6
>
> loaded via a namespace (and not attached):
> [1] grid_2.14.1    lattice_0.20-0
>
>
> i have now updated to R 2.15, and my session info is:
>
>> sessionInfo()
> R version 2.15.0 (2012-03-30)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods
> [7] base
>
> other attached packages:
> [1] xts_0.8-6 zoo_1.7-7
>
> loaded via a namespace (and not attached):
> [1] grid_2.15.0    lattice_0.20-6 tools_2.15.0
>
> For the XTS object mn your suggestion still fails with an error:
>
>> adf <- aggregate(mn[,-1]~mn[,1], data=mn, sum); adf
> Error in aggregate.formula(mn[, -1] ~ mn[, 1], data = mn, sum) :
>  'names' attribute [3] must be the same length as the vector [1]
>
> however when i convert to a zoo with
>
>> mnz <- as.zoo(mn)
>
> I get some errors, but it works

Those are warnings, ... not errors.

>
>> adf <- aggregate(mnz[,-1]~mnz[,1], data=mnz, sum); adf
> Warning messages:
> 1: In zoo(rval, index(x)[i]) :
>  some methods for “zoo” objects do not work if the index entries in
> ‘order.by’ are not unique
> 2: In zoo(rval, index(x)[i]) :
>  some methods for “zoo” objects do not work if the index entries in
> ‘order.by’ are not unique
> 3: In zoo(rval[i], index(x)[i]) :
>  some methods for “zoo” objects do not work if the index entries in
> ‘order.by’ are not unique
> 4: In zoo(rval[i], index(x)[i]) :
>  some methods for “zoo” objects do not work if the index entries in
> ‘order.by’ are not unique
> 5: In zoo(xc[ind], ix[ind]) :
>  some methods for “zoo” objects do not work if the index entries in
> ‘order.by’ are not unique
> 6: In zoo(xc[ind], ix[ind]) :
>  some methods for “zoo” objects do not work if the index entries in
> ‘order.by’ are not unique
>  mnz[, 1] mnz[, -1]
> 1    97.90       408
> 2    97.89       208
>
> So is this a bug in XTS?

It does look that way to me. The correct way to report this is to  
contact the package maintainer (copied on this message) , (although I  
did notice that Joshua Ulrich already looked at this posting in SO and  
he is on the xts development team). You should have put in this at the  
beginning of your code :

library(xts)

-- 
David.
>
> thanks for your patience
>
> mj
>
> On 13 June 2012 15:53, David Winsemius <dwinsemius at comcast.net> wrote:
>>
>> On Jun 13, 2012, at 9:38 AM, Matthew Johnson wrote:
>>
>>> Sorry, i'll try and put more flesh on the bones.
>>>
>>> please note, i changed the data in the example, as fiddling has  
>>> raised
>>> another question that's best illustrated with a slightly different
>>> data set.
>>>
>>> first of all, when i do as you suggest, i obtain the following  
>>> error:
>>>
>>>> PxMat <- aggregate(mm[,-1] ~ mm[,1], data=mm, sum)
>>>
>>>
>>> Error in aggregate.formula(mm[, -1] ~ mm[, 1], data = mm, sum) :
>>>  'names' attribute [3] must be the same length as the vector [1]
>>
>>
>> Very strange. When I just did it with the structure you (cross-)  
>> posted on
>> SO I got:
>>
>>> adf <- aggregate(mm[,-1]~mm[,1], data=mm, sum); adf
>> snipped warning messages
>>  mm[, 1] mm[, -1]
>>
>> 1   97.91      538
>> 2   97.92      918
>>
>> I had earlier tested it with a zoo object I had constructed and did  
>> it again
>> with the structure below.
>>
>>  mm[, 1] mm[, -1]
>>
>> 1   97.91      538
>> 2   97.92      918
>>
>> I'm using zoo_1.7-6 and R version 2.14.2 on a Mac. I do not  
>> remember you
>> posting the requested information about your versions.
>>
>> --
>> David.
>>
>>
>>>
>>> my data.frame is an xts, and it looks like this:
>>>
>>>                    px_ym1 vol_ym1
>>> 2012-06-01 09:30:00  97.90       9
>>> 2012-06-01 09:30:00  97.90      60
>>> 2012-06-01 09:30:00  97.90      71
>>> 2012-06-01 09:30:00  97.90       5
>>> 2012-06-01 09:30:00  97.90       3
>>> 2012-06-01 09:30:00  97.90      21
>>> 2012-06-01 09:31:00  97.90       5
>>> 2012-06-01 09:31:00  97.89     192
>>> 2012-06-01 09:31:00  97.89      65
>>> 2012-06-01 09:31:00  97.89      73
>>> 2012-06-01 09:31:00  97.89       1
>>> 2012-06-01 09:31:00  97.89       1
>>> 2012-06-01 09:31:00  97.89      39
>>> 2012-06-01 09:31:00  97.90      15
>>> 2012-06-01 09:31:00  97.90       1
>>> 2012-06-01 09:31:00  97.89       1
>>> 2012-06-01 09:31:00  97.90      18
>>> 2012-06-01 09:31:00  97.89       1
>>> 2012-06-01 09:32:00  97.89      33
>>> 2012-06-01 09:34:00  97.89       1
>>> 2012-06-01 09:34:00  97.89       1
>>>
>>> dput(mn) returns:
>>>
>>>> dput(mn)
>>>
>>> structure(c(97.9, 97.9, 97.9, 97.9, 97.9, 97.9, 97.9, 97.89,
>>> 97.89, 97.89, 97.89, 97.89, 97.89, 97.9, 97.9, 97.89, 97.9, 97.89,
>>> 97.89, 97.89, 97.89, 9, 60, 71, 5, 3, 21, 5, 192, 65, 73, 1,
>>> 1, 39, 15, 1, 1, 18, 1, 33, 1, 1), .indexCLASS = c("POSIXct",
>>> "POSIXt"), .indexTZ = "GMT", class = c("xts", "zoo"), index =
>>> structure(c(1338543000,
>>> 1338543000, 1338543000, 1338543000, 1338543000, 1338543000,  
>>> 1338543060,
>>> 1338543060, 1338543060, 1338543060, 1338543060, 1338543060,  
>>> 1338543060,
>>> 1338543060, 1338543060, 1338543060, 1338543060, 1338543060,  
>>> 1338543120,
>>> 1338543240, 1338543240), tzone = "GMT", tclass = c("POSIXct",
>>> "POSIXt")), .Dim = c(21L, 2L), .Dimnames = list(NULL, c("px_ym1",
>>> "vol_ym1")))
>>>
>>> as you can see, the xts data.frame xts data.frame that contains  
>>> dates,
>>> prices and volumes. There is much more data over a long time period,
>>> and i'm interested in various sub-setting and then aggregate
>>> operations.
>>>
>>> I would like to split the data by time period and aggregate the  
>>> data,
>>> such that i obtain a table which reports the volume traded at each
>>> price, for each of the time-period splits that i have chosen.
>>>
>>> I have employed the following approach:
>>>
>>> PxMat <- aggregate(.~px_ym1, data=mn, sum)
>>>
>>>
>>> which yields:
>>>
>>>  px_ym1 vol_ym1
>>> 1  97.89     408
>>> 2  97.90     208
>>>
>>> and for subsets, i use the following grouping:
>>>
>>>> PxMat30 <- aggregate(.~px_ym1, data=mn[.indexmin(mn) == '30'], sum)
>>>
>>>
>>> Which yields:
>>>
>>>  px_ym1 vol_ym1
>>> 1   97.9     169
>>>
>>> and
>>>
>>>> PxMat31 <- aggregate(.~px_ym1, data=mn[.indexmin(mn) == '31'], sum)
>>>
>>>
>>> which yields:
>>>
>>>  px_ym1 vol_ym1
>>> 1  97.89     373
>>> 2  97.90      39
>>>
>>> and so on and so forth for each minute.
>>>
>>> when i try and sub-set using general notation, as follows:
>>>
>>> PxMat <- aggregate(.~mn[,1], data=mn, sum)
>>>
>>> this yields a different form of output:
>>>
>>> px_ym1  px_ym1 vol_ym1
>>> 1  97.90 1076.79     408
>>> 2  97.89  979.00     208
>>>
>>> the problem is that i now have the sum of the px_ym1 data (the sum  
>>> of
>>> mn[,1])
>>>
>>> hopefully things are now clearer - sorry to have wasted your time up
>>> until now.
>>>
>>> assuming that i have now made my situation clear, i am hope you can
>>> help with four specific questions.
>>>
>>> 1/ My data-sets are HUGE, so speed is an issue - is this the fastest
>>> way to sub-set and aggregate an xts?
>>>
>>> 2/ is there a way to do this for multiple splits? say a table for  
>>> each
>>> minute, day, week, or month? the return would potentially be a list
>>> with a table for each day / minute etc showing volume traded at each
>>> price -- but it doesn't have to be a list ...
>>>
>>> i am writing a function with loops that would generate a table that
>>> reports volume traded at each price for each case of a specified  
>>> time
>>> split (say for four tables, one for each minute in the example data,
>>> returned as a list). my solution is slow, it seems like something  
>>> that
>>> someone would have done better already. is this the case?
>>>
>>> 3/ is there a way to do the sub-setting with templated variables? i
>>> would like to obtain the table i get with the named aggregate
>>> functions (reproduced above) with multiple data frames, as the  
>>> column
>>> names will differ from time to time. i cannot figure out how to stop
>>> the command from summing the mn[,1] column when i stop using  
>>> variable
>>> names.
>>>
>>> 4/ on a related note, is it possible to apply different functions to
>>> different columns of data? It would be nice, for example, if the  
>>> table
>>> returned from an aggregate command could be made to be:
>>>
>>> px_ym1  count vol_ym1
>>> 1  97.90  11     408
>>> 2  97.89  10     208
>>>
>>> where we have the price traded, the number of trades (a count of
>>> px_ym1 / mn[,1], and the sum of vol_ym1 (mn[,2]).
>>>
>>> thanks and best regards
>>>
>>> matt johnson
>>>
>>> On 13 June 2012 15:06, David Winsemius <dwinsemius at comcast.net>  
>>> wrote:
>>>>
>>>>
>>>>
>>>> On Jun 12, 2012, at 11:32 PM, Matthew Johnson wrote:
>>>>
>>>>> Dear R-help,
>>>>>
>>>>> I have an xts data set that i have subset by date.
>>>>>
>>>>> now it contains a date-time-stamp, and two columns (price and  
>>>>> volume
>>>>> traded): my objective is to create tables of volume traded at a  
>>>>> price -
>>>>> and
>>>>> i've been successfully using aggregate to do so in interactive  
>>>>> use.
>>>>>
>>>>> say the data looks as follows:
>>>>>
>>>>>                  px_ym1 vol_ym1
>>>>> 2012-06-01 09:37:00  97.91     437
>>>>> 2012-06-01 09:37:00  97.91      64
>>>>> 2012-06-01 09:37:00  97.91       1
>>>>> 2012-06-01 09:37:00  97.91       5
>>>>> 2012-06-01 09:37:00  97.91       5
>>>>> 2012-06-01 09:37:00  97.92     174
>>>>> 2012-06-01 09:37:00  97.92      64
>>>>> 2012-06-01 09:37:00  97.92     125
>>>>> 2012-06-01 09:37:00  97.92     124
>>>>> 2012-06-01 09:37:00  97.92      64
>>>>> 2012-06-01 09:37:00  97.92     109
>>>>> 2012-06-01 09:37:00  97.92      64
>>>>> 2012-06-01 09:37:00  97.92      19
>>>>> 2012-06-01 09:37:00  97.92      45
>>>>> 2012-06-01 09:37:00  97.92      75
>>>>> 2012-06-01 09:37:00  97.92       3
>>>>> 2012-06-01 09:37:00  97.92      47
>>>>> 2012-06-01 09:37:00  97.91      26
>>>>> 2012-06-01 09:37:00  97.92       4
>>>>> 2012-06-01 09:37:00  97.92       1
>>>>>
>>>>> the the following gives me what i'm looking for:
>>>>>
>>>>>> adf <- aggregate(.~px_ym1, data=mm, sum)
>>>>>
>>>>>
>>>>>
>>>>> which is this table:
>>>>>
>>>>> px_ym1 vol_ym1
>>>>> 1  97.91     538
>>>>> 2  97.92     918
>>>>>
>>>>> however now i'm trying to code it to run automatically, and use  
>>>>> of the
>>>>> templated version:
>>>>>
>>>>>> adf <- aggregate(.~mm[,1], data=mm, sum)
>>>>
>>>>
>>>>
>>>> Did you try:
>>>>
>>>> adf <- aggregate(mm[,-1] ~ mm[,1], data=mm, sum)
>>>> adf
>>>>
>>>> I would have used names:
>>>>
>>>> adf <- aggregate(vol_ym1 ~ px_ym1, data=mm, sum)
>>>> adf
>>>>
>>>>
>>>>
>>>>> yields the following - which contains what i'd like, but is has  
>>>>> also
>>>>> summed
>>>>> across the price column (not ideal).
>>>>>
>>>>>  px_ym1  px_ym1 vol_ym1
>>>>> 1  97.91  587.46     538
>>>>> 2  97.92 1370.88     918
>>>>>
>>>>> how do i code this so that i can enter an xts data-frame with  
>>>>> arbitrary
>>>>> names and still obtain the table with only the information i  
>>>>> desire?
>>>>
>>>>
>>>>
>>>> That is too far to the vague side of the vague-specific continuum.
>>>>
>>>>
>>>>>
>>>>> on a related point, is there a way to combine the two steps?
>>>>
>>>>
>>>>
>>>> Er, which two steps would that be?
>>>>
>>>>
>>>>> the function
>>>>> i've written splits by date and then returns a list containing
>>>>> data-frames
>>>>> that report the volume traded at each price on each date
>>>>>
>>>>> - am i re-creating the wheel here? is there canned function that  
>>>>> does
>>>>> this?
>>>>>
>>>>> thanks + best regards
>>>>>
>>>>> matt johnson
>>>>
>>>>
>>>>
>>>>
>>>> David Winsemius, MD
>>>> West Hartford, CT
>>>>
>>
>> David Winsemius, MD
>> West Hartford, CT
>>

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list