[R] Yearly statistics

Gabor Grothendieck ggrothendieck at gmail.com
Mon May 28 14:34:40 CEST 2007


Here are a couple of solutions:

1. using zoo package

First add Date to the header so there
are the same number of column headers as columns and
then read in using read.zoo.  Then aggregate over years
using mean.  For more on zoo try library(zoo); vignette("zoo")
and for more on dates see the R News 4/1 help desk article.

# added Date to the header
Lines <- "Date open  high   low    close  hc  lc
2004-12-29 4135 4135 4106  4116  8 -21
2004-12-30 4120 4131 4115  4119 15  -1
2004-12-31 4123 4124 4114  4117  5  -5
2005-01-04 4106 4137 4103  4137 20 -14
2005-01-06 4085 4110 4085  4096 10 -15
2005-01-10 4133 4148 4122  4139 15 -11
2005-01-11 4142 4158 4127  4130 19 -12
2005-01-12 4113 4138 4112  4127  18  8
"

library(zoo)

# z <- read.zoo("myfile.dat", header = TRUE)
z <- read.zoo(textConnection(Lines), header = TRUE)

aggregate(z[,"hc"] > 0 & z[,"lc"] < 0, function(x) format(x, "%Y"), mean)


2. Using data frames and tapply

Read in as a data frame, calculate year and tapply the mean
by year:

# Lines is from above

# dat <- read.table("myfile.dat", header = TRUE)
dat <- read.table(textConnection(Lines), header = TRUE)

year <- as.numeric(format(as.Date(dat$Date), "%Y"))
tapply(dat$hc > 0 & dat$lc < 0, year, mean)

On 5/27/07, Alfonso Sammassimo <cincinattikid at bigpond.com> wrote:
> Dear R-experts,
>
> Sorry if I've overlooked a simple solution here. I have calculated a
> proportion of the number of observations which meet a criteria, applied to
> five years of data. How can I break down this proportion statistic for each
> year?
>
> For example (data in zoo format):
>
>                    open  high   low    close  hc  lc
> 2004-12-29 4135 4135 4106  4116  8 -21
> 2004-12-30 4120 4131 4115  4119 15  -1
> 2004-12-31 4123 4124 4114  4117  5  -5
> 2005-01-04 4106 4137 4103  4137 20 -14
> 2005-01-06 4085 4110 4085  4096 10 -15
> 2005-01-10 4133 4148 4122  4139 15 -11
> 2005-01-11 4142 4158 4127  4130 19 -12
> 2005-01-12 4113 4138 4112  4127  18  8
>
> Statistic of interest is proportion of times that sign of "hc" is positive
> and sign of "lc" is negative on any given day. Looking to return something
> like:
>
> Yr        Prop
> 2004    1.0
> 2005    0.8
>
> Along these lines, if I have datasets A and B, where B is a subset of A, can
> I use the number of matching dates to calculate the yearly proportions in
> question?
>
> Thanks,
> Alfonso Sammassimo
> Melbourne Australia
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list