[R] uniq -c

Sam Steingold sds at gnu.org
Wed Oct 17 21:57:56 CEST 2012


> * Sam Steingold <fqf at tah.bet> [2012-10-16 11:03:27 -0400]:
>
> I need an analogue of "uniq -c" for a data frame.

Summary of options:

1. William:

isFirstInRun <- function(x) UseMethod("isFirstInRun")
isFirstInRun.default <- function(x) c(TRUE, x[-1] != x[-length(x)])
isFirstInRun.data.frame <- function(x) {
  stopifnot(ncol(x)>0)
  retval <- isFirstInRun(x[[1]])
  for(column in x) {
    retval <- retval | isFirstInRun(column)
  }
  retval
}
row.count.1 <- function (x) {
  i <- which(isFirstInRun(x))
  data.frame(x[i,], count=diff(c(i, 1L+nrow(x))))
}

147 seconds

2. http://orgmode.org/worg/org-contrib/babel/examples/Rpackage.html#sec-6-1
row.count.2 <- function (x) {
  equal.to.previous <- rowSums( x[2:nrow(x),] != x[1:(nrow(x)-1),] )==0
  tf.runs <- rle(equal.to.previous)
  counts <- c(1, unlist(mapply(function(x,y) if (y) x+1 else (rep(1,x)),
                               tf.runs$length, tf.runs$value)))
  counts <- counts[ c( diff( counts ) <= 0, TRUE ) ]
  unique.rows <- which( c(TRUE, !equal.to.previous ) )
  cbind(x[ unique.rows, ,drop=FALSE ], counts)
}

136 seconds

3. Micael: paste/strsplit

row.count.3 <- function (x) {
  pa <- do.call(paste,x)
  rl <- rle(p)
  sp <- strsplit(as.character(rl$values)," ")
  data.frame(user = sapply(sp,"[",1),
             country = sapply(sp,"[",2),
             language = sapply(sp,"[",3),
             count = rl$length)
}

here I know the columns and rely on absense of spaces in values.

27 seconds.

Thanks to all who answered.

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://www.PetitionOnline.com/tap12009/
http://thereligionofpeace.com http://ffii.org http://camera.org
A slave dreams not of Freedom, but of owning his own slaves.




More information about the R-help mailing list