[R] Re: summarizing dataframe

Jim Rogers jrogers at cantatapharm.com
Mon Jan 13 14:44:03 CET 2003


The options I know of are:

1. aggregate (in the base package), with FUN = length. But this converts
character vectors to factors, which is sometimes annoying and sometimes
dangerous.
2. summarize, in the Hmisc package (again, with FUN = length). I find
summarize to be a very useful function in general, but it has a lot of
overhead if all you want is counts. Very slow with a large data frame. 
3. Some wrapper that calls tabulate directly. I use:

table.mat <- function(x) {
  uid <- do.call("paste", as.list(x))
  count <- tabulate(factor(uid))
  x <- x[order(uid), ]
  i <- !duplicated(sort(uid))
  out <- x[i, ]
  out$Count <- count
  last <- length(out)
  o <- do.call("order", as.list(out[-last]))
  out <- out[o, ]
  dimnames(out) <- list(1:(dim(out)[1]), names(out))
  out
}

This is based on my memory of a function that I think Scott Chasalow
wrote and often used. My memory is only of what the function did, not on
the code, so Scott may have something a bit better? (I am cc'ing Scott)

 
> Message: 16
> From: Alexander.Herr at csiro.au
> To: r-help at stat.math.ethz.ch
> Date: Mon, 13 Jan 2003 14:22:23 +1000
> Subject: [R] summarizing dataframe
> 
> Hi Listers,
> 
> Surely, I just have a mental block and there is a more elegant way of
creating a 
> summary count (other than extracing it from ftable). I'd like to
create a new
> data.frame containing counts of spell by loc ie have three columns
showing
> spell,loc,count. Below the data.frame...
> 
> Any help appreciated
> Thanks Herry

Jim  

James A. Rogers, Ph.D. <rogers at cantatapharm.com>
Statistical Scientist
Cantata Pharmaceuticals
3-G Gill St
Woburn, MA  01801
617.225.9009
Fax 617.225.9010




More information about the R-help mailing list