# [Rd] aggregate.formula

Arni Magnusson arnima at u.washington.edu
Wed May 26 20:55:42 CEST 2004

```This relates to a message from Christophe Pallier to r-help some time ago.
Like myself, he finds aggregate very useful, but the interface a little
cumbersome. I've implemented a more compact formula interface, found at
the bottom of this message:

data(ToothGrowth)

# I used to aggregate like this:
aggregate(list(len=ToothGrowth\$len),
list(supp=ToothGrowth\$supp,dose=ToothGrowth\$dose), mean)

# Recently, I discovered a slightly shorter call:
with(ToothGrowth, aggregate(list(len=len), list(supp=supp,dose=dose),
mean))

# But aggregate.formula allows:
aggregate(len~supp*dose, data=ToothGrowth, mean)
# as well as subsetting:
aggregate(len~supp*dose, data=ToothGrowth, subset=dose<2, mean)

I use * notation, since the means correspond to aov(len~supp*factor(dose),
data=ToothGrowth) but + notation is also supported. The implementation is
probably not top-notch, but I think many R users would appreciate
something like aggregate.formula.

Cheers,
Arni

---

"aggregate.formula" <-
function(formula, data=NULL, FUN=mean, subset=TRUE)
########################################################################
###                                                                    #
### Function: aggregate.formula                                        #
###                                                                    #
### Purpose:  Compute summary statistics from a formula                #
###                                                                    #
### Args:     formula is a formula like y~x                            #
###           data is where formula terms are stored, usually a data   #
###             frame, list, or NULL for workspace                     #
###           FUN is a function to compute the summary statistics      #
###           subset is a logical vector specifying which part of the  #
###             data to summarize, or TRUE to include all data         #
###                                                                    #
### Author:   Arni Magnusson <arnima at u.washington.edu>, inspired by    #
###             an R-help message from Christophe Pallier              #
###                                                                    #
### Returns:  Data frame containing summary statistics                 #
###                                                                    #
########################################################################
{
x.str  <- as.character(formula[2])
by.str <- as.character(formula[3])
by.str <- unlist(strsplit(by.str, " [\\*\\+] "))
if(is.null(data))
{
x  <- as.data.frame(get(x.str,pos=1))
by <- as.data.frame(lapply(by.str,get,pos=1))
}
else if(is.data.frame(data))
{
x  <- eval(data)[,x.str,drop=FALSE]
by <- eval(data)[,by.str,drop=FALSE]
}
else  # assume list of some sort
{
x  <- as.data.frame(eval(data)[x.str])
by <- as.data.frame(eval(data)[by.str])
}
attach(data)
output <- aggregate(x[subset,,drop=FALSE], by[subset,,drop=FALSE], FUN)
detach(data)
names(output) <- c(by.str, x.str)
return(output)
}

```