[R] block statistics with POSIX classes

Gabor Grothendieck ggrothendieck at myway.com
Thu Sep 23 19:03:32 CEST 2004


Kahra Hannu <kahra <at> mpsgr.it> writes:

: 
: I have followed Gabor's instructions:
: 
: > aggregate(list(y=y), list(dp$year), mean)$y 			# 
returns NULL since y is a time series
: NULL
:  
: > aggregate(list(y=as.vector(y)), list(dp$year), mean)$y	# returns 
annual means
: [1]  0.0077656696  0.0224050294  0.0099991898  0.0240550925 -0.0084085867
: [6] -0.0170950194 -0.0355641251  0.0065873997  0.0008253111
: 
: > aggregate(list(y=y), list(dp$year), mean)			# returns the 
same as the previous one
:   Group.1      Series.1
: 1      96  0.0077656696
: 2      97  0.0224050294
: 3      98  0.0099991898
: 4      99  0.0240550925
: 5     100 -0.0084085867
: 6     101 -0.0170950194
: 7     102 -0.0355641251
: 8     103  0.0065873997
: 9     104  0.0008253111
: 
: Gabor's second suggestion returns different results:
: 
: > aggregate(ts(y, start=c(dp$year[1],dp$mon[1]+1), freq = 12), nfreq=1, mean)
: Time Series:
: Start = 96.33333 
: End = 103.3333 
: Frequency = 1 
:          Series 1
: [1,]  0.016120895
: [2,]  0.024257131
: [3,]  0.007526997
: [4,]  0.017466118
: [5,] -0.016024846
: [6,] -0.017145159
: [7,] -0.036047765
: [8,]  0.014198501
: 
: > aggregate(y, 1, mean) 		# verifies the result above
: Time Series:
: Start = 1996.333 
: End = 2003.333 
: Frequency = 1 
:          Series 1
: [1,]  0.016120895
: [2,]  0.024257131
: [3,]  0.007526997
: [4,]  0.017466118
: [5,] -0.016024846
: [6,] -0.017145159
: [7,] -0.036047765
: [8,]  0.014198501
: 
: The data is from 1996:5 to 2004:8. The difference of the results must depend 
on the fact that the beginning of
: the data is not January and the end is not December? The first two solutions 
give nine annual means while the
: last two give only eight means. The block size in the last two must be 12 
months, as is said in ?aggregate,
: instead of a calender year that I am looking for. Gabor's first suggestion 
solved my problem.

Yes, that seems to be the case.  Using length instead of 
mean we find that the aggregate.data.frame example used calendar 
years as the basis of aggregation whereas the aggregate.ts example
used successive 12 month periods starting from the first month discarding
the 4 points at the end which do not fill out a full year.

R> set.seed(1)
R> dp <- as.POSIXlt(seq(from=as.Date("1996-5-1"), to=as.Date("2004-8-1"), 
+          by="month"))
R> y <- rnorm(length(dp$year))

R> aggregate(list(y=y), list(dp$year), length)$y
[1]  8 12 12 12 12 12 12 12  8

R> aggregate(ts(y, start=c(dp$year[1],dp$mon[1]+1), freq = 12), nfreq=1, 
length)
Time Series:
Start = 96.33333 
End = 103.3333 
Frequency = 1 
[1] 12 12 12 12 12 12 12 12




More information about the R-help mailing list