[Rd] Suggestion to extend aggregate() to return multiple and/or named values

Gabor Grothendieck ggrothendieck at gmail.com
Fri Jul 13 19:04:03 CEST 2007


Note that summaryBy in the doBy package can also do that.

library(doBy)
DF <- data.frame(z, A = Ind$A, B = Ind$B)
summaryBy(z ~ A + B, DF, FUN = summary)
summaryBy(z ~ A + B, DF, FUN = summary2)

On 7/13/07, Mike Lawrence <Mike.Lawrence at dal.ca> wrote:
> Hi all,
>
> This is my first post to the developers list. As I understand it,
> aggregate() currently repeats a function across cells in a dataframe
> but is only able to handle functions with single value returns.
> Aggregate() also lacks the ability to retain the names given to the
> returned value. I've created an agg() function (pasted below) that is
> apparently backwards compatible (i.e. returns identical results as
> aggregate() if the function returns a single unnamed value), but is
> able to handle named and/or multiple return values. The code may be a
> little inefficient (there must be an easier way to set up the 'temp'
> data frame than to call aggregate and remove the final column), but
> I'm suggesting that something similar to this may be profitably used
> to replace aggregate entirely.
>
> #modified aggregate command, allowing for multiple/named output values
> agg=function(z,Ind,FUN,...){
>        FUN.out=by(z,Ind,FUN,...)
>        num.cells=length(FUN.out)
>        num.dv=length(FUN.out[[1]])
>
>        temp=aggregate(z,Ind,length) #dummy data frame
>        temp=temp[,c(1:(length(temp)-1))] #remove last column from dummy frame
>
>        for(i in 1:num.dv){
>                temp=cbind(temp,NA)
>                n=names(FUN.out[[1]])[i]
>                names(temp)[length(temp)]=ifelse(!is.null(n),n,ifelse(i==1,'x',paste
> ('x',i,sep='')))
>                for(j in 1:num.cells){
>                        temp[j,length(temp)]=FUN.out[[j]][i]
>                }
>        }
>        return(temp)
> }
>
> #create some factored data
> z=rnorm(100) # the DV
> A=rep(1:2,each=25,2) #one factor
> B=rep(1:2,each=50) #another factor
> Ind=list(A=A,B=B) #the factor list
>
> aggregate(z,Ind,mean) #show the means of each cell
> agg(z,Ind,mean) #should be identical to aggregate
>
> aggregate(z,Ind,summary) #returns an error
> agg(z,Ind,summary) #returns named columns
>
> #Make a function that returns multiple unnamed values
> summary2=function(x){
>        s=summary(x)
>        names(s)=NULL
>        return(s)
> }
> agg(z,Ind,summary2) #returns multiple columns, default names
>
>
> --
> Mike Lawrence
> Graduate Student, Department of Psychology, Dalhousie University
>
> Website: http://memetic.ca
>
> Public calendar: http://icalx.com/public/informavore/Public
>
> "The road to wisdom? Well, it's plain and simple to express:
> Err and err and err again, but less and less and less."
>        - Piet Hein
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



More information about the R-devel mailing list