[R] How to apply a function to subsets of a data frame *and* obtain a data frame again?

Marius Hofert m_hofert at web.de
Wed Aug 17 12:42:21 CEST 2011


Dear all,

First, let's create some data to play around:

set.seed(1)
(df <- data.frame(Group=rep(c("Group1","Group2","Group3"), each=10), 
                 Value=c(rexp(10, 1), rexp(10, 4), rexp(10, 10)))[sample(1:30,30),])

## Now we need the empirical distribution function:
edf <- function(x) ecdf(x)(x) # empirical distribution function evaluated at x

## The big question is how one can apply the empirical distribution function to 
## each subset of df determined by "Group", so how to apply it to Group1, then
## to Group2, and finally to Group3. You might suggest (?) to use tapply:

(edf. <- tapply(df$Value, df$Group, FUN=edf))

## That's correct. But typically, one would like to obtain not only the values, 
## but a data.frame containing the original information and the new (edf-)values.
## What's a simple way to get this? (one would be required to first sort df 
## according to Group, then paste the values computed by edf to the sorted df; 
## seems a bit tedious). 
## A solution I have is the following (but I would like to know if there is a 
## simpler one):

(edf.. <- do.call("rbind", lapply(unique(df$Group), function(strg){
    subdata <- subset(df, Group==strg) # sub-data
    subdata <- cbind(subdata, edf=edf(subdata$Value))
})) )


Cheers,

Marius


More information about the R-help mailing list