[R] Counting occurences of variables in a dataframe

Sat Feb 11 19:59:51 CET 2012

On Sat, Feb 11, 2012 at 07:17:54PM +0100, Kai Mx wrote:
> Hi everybody,
> I have a large dataframe similar to this one:
> knames <-c('ab', 'aa', 'ac', 'ad', 'ab', 'ac', 'aa', 'ad','ae', 'af')
> kdate <- as.Date( c('20111001', '20111102', '20101001', '20100315',
> '20101201', '20110105', '20101001', '20110504', '20110603', '20110201'),
> format="%Y%m%d")
> kdata <- data.frame (knames, kdate)
> I would like to add a new variable to the dataframe counting the
> occurrences of different values in knames in their order of appearance
> (according to the date as in indicated in kdate). The solution should be a
> variable with the values 2,2,1,1,1,2,1,2,1,1. I could do it with a loop,
> but there must be a more elegant way to this.

Hi.

Is the first 2 in the new variable due to the fact that
the name is "ab" and "ab" at row 5 has older date? If so,
then try the following

  ind <- order(kdata$kdate)
  f <- function(x) seq.int(along.with=x)
  kdata$x <- ave(1:nrow(kdata), kdata$knames[ind], FUN=f)[order(ind)]

     knames      kdate x
  1      ab 2011-10-01 2
  2      aa 2011-11-02 2
  3      ac 2010-10-01 1
  4      ad 2010-03-15 1
  5      ab 2010-12-01 1
  6      ac 2011-01-05 2
  7      aa 2010-10-01 1
  8      ad 2011-05-04 2
  9      ae 2011-06-03 1
  10     af 2011-02-01 1

kdata$knames[ind] orders the names by increasing date.
ave(...)[order(ind)] reorders the output of ave() to the original order.

Hope this helps.

Petr Savicky.