[R] best way to apply a list of functions to a dataset ?

Glen Barnett glnbrntt at gmail.com
Tue Jul 20 07:51:29 CEST 2010


Assuming I have a matrix of data (or under some restrictions that will
become obvious, possibly a data frame), I want to be able to apply a
list of functions (initially producing a single number from a vector)
to the data and produce a data frame (for compact output) with column
1 being the function results for the first function, column 2 being
the results for the second function and so on - with each row being
the columns of the original data.

The obvious application of this is to produce summaries of data sets
(a bit like summary() does on numeric matrices), but with user
supplied functions. I am content for the moment to leave it to the
user to supply functions that work with the data they supply so as to
produce results that will actually be data-frame-able, though I'd like
to ultimately make it a bit nicer than it currently is without
compromising the niceness of the output in the "good" cases.

The example below is a simplistic approach to this problem (it should
run as is). I have named it "fapply" for fairly obvious reasons, but
added the ".1" because it doesn't accept multidimensional arrays. I
have included the output I generated, which is what I want. There are
some obvious generalizations (e.g. being able to include functions
like range(), say, that produce several values on a vector, rather
than one, making the user's life simpler when a function already does
most of what they need).

The question is: this looks like a silly approach, growing a list
inside a for loop. Also I recall reading that if you find yourself
using "do.call" you should probably be doing something else.

So my question: Is there a better way to implement a function like this?

Or, even better, is there already a function that does this?

## example function and code to apply a list of functions to a matrix
(here a numeric data frame)

library(datasets)

fapply.1 <- function(x, fun.l, colnames=fun.l){
out.l <- list()   # starts with an empty list
for (i in seq_along(fun.l)) out.l[[i]] <- apply(x,2,fun.l[[i]])   #
loop through list of functions

# set up names and make into a data frame
names(out.l) <- colnames
attr(out.l,"row.names") <- names(out.l[[1]])
attr(out.l,"class") <- "data.frame"
out.l
}

skewness <- function(x) mean(scale(x)^3)      #define a simple numeric function
mean.gt.med <- function(x) mean(x)>median(x)  # define a simple non-numeric fn
flist <- c("mean","sd","skewness","median","mean.gt.med") # make list
of fns to apply

fapply.1(attitude,flist)
              mean        sd    skewness median mean.gt.med
rating     64.63333 12.172562 -0.35792491   65.5       FALSE
complaints 66.60000 13.314757 -0.21541749   65.0        TRUE
privileges 53.13333 12.235430  0.37912287   51.5        TRUE
learning   56.36667 11.737013 -0.05403354   56.5       FALSE
raises     64.63333 10.397226  0.19754317   63.5        TRUE
critical   74.76667  9.894908 -0.86577893   77.5       FALSE
advance    42.93333 10.288706  0.85039799   41.0        TRUE

## end code and output

So did I miss something obvious?

Any suggestions as far as style or simple stability-enhancing
improvements would be handy.

regards,
Glen



More information about the R-help mailing list