[R] How to apply a function to subsets of a data frame *and* obtain a data frame again?

Paul Hiemstra paul.hiemstra at knmi.nl
Wed Aug 17 13:34:19 CEST 2011


 On 08/17/2011 11:24 AM, Nick Sabbe wrote:
> You might want to look at package plyr and use ddply.

The following example does what you want using ddply:

library(plyr)
edfPerGroup = ddply(df, .(Group), summarise, edf = edf(Value), Value =
Value)
> edfPerGroup
    Group edf       Value
1  Group1 0.5 0.539682840
2  Group1 0.2 0.145706727
3  Group1 0.7 0.956567494
4  Group1 0.3 0.147045991
5  Group1 0.9 1.229562053
6  Group1 0.4 0.436068626
7  Group1 0.8 1.181642779
8  Group1 0.1 0.139795262
9  Group1 1.0 2.894968537
10 Group1 0.6 0.755181833

cheers,
Paul



> HTH,
>
>
> Nick Sabbe
> --
> ping: nick.sabbe at ugent.be
> link: http://biomath.ugent.be
> wink: A1.056, Coupure Links 653, 9000 Gent
> ring: 09/264.59.36
>
> -- Do Not Disapprove
>
>
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>> project.org] On Behalf Of Marius Hofert
>> Sent: woensdag 17 augustus 2011 12:42
>> To: Help R
>> Subject: [R] How to apply a function to subsets of a data frame *and*
>> obtain a data frame again?
>>
>> Dear all,
>>
>> First, let's create some data to play around:
>>
>> set.seed(1)
>> (df <- data.frame(Group=rep(c("Group1","Group2","Group3"), each=10),
>>                  Value=c(rexp(10, 1), rexp(10, 4), rexp(10,
>> 10)))[sample(1:30,30),])
>>
>> ## Now we need the empirical distribution function:
>> edf <- function(x) ecdf(x)(x) # empirical distribution function
>> evaluated at x
>>
>> ## The big question is how one can apply the empirical distribution
>> function to
>> ## each subset of df determined by "Group", so how to apply it to
>> Group1, then
>> ## to Group2, and finally to Group3. You might suggest (?) to use
>> tapply:
>>
>> (edf. <- tapply(df$Value, df$Group, FUN=edf))
>>
>> ## That's correct. But typically, one would like to obtain not only the
>> values,
>> ## but a data.frame containing the original information and the new
>> (edf-)values.
>> ## What's a simple way to get this? (one would be required to first
>> sort df
>> ## according to Group, then paste the values computed by edf to the
>> sorted df;
>> ## seems a bit tedious).
>> ## A solution I have is the following (but I would like to know if
>> there is a
>> ## simpler one):
>>
>> (edf.. <- do.call("rbind", lapply(unique(df$Group), function(strg){
>>     subdata <- subset(df, Group==strg) # sub-data
>>     subdata <- cbind(subdata, edf=edf(subdata$Value))
>> })) )
>>
>>
>> Cheers,
>>
>> Marius
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Paul Hiemstra, Ph.D.
Global Climate Division
Royal Netherlands Meteorological Institute (KNMI)
Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39
P.O. Box 201 | 3730 AE | De Bilt
tel: +31 30 2206 494

http://intamap.geo.uu.nl/~paul
http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770



More information about the R-help mailing list