[R] How to average subgroups in a dataframe? (not sure how to apply aggregate(..))

Chuck Cleland ccleland at optonline.net
Wed Oct 21 14:16:28 CEST 2009


On 10/21/2009 7:03 AM, Tony Breyal wrote:
> Dear all,
> 
> Lets say I have the following data frame:
> 
>> set.seed(1)
>> col1 <- c(rep('happy',9), rep('sad', 9))
>> col2 <- rep(c(rep('alpha', 3), rep('beta', 3), rep('gamma', 3)),2)
>> dates <- as.Date(rep(c('2009-10-13', '2009-10-14', '2009-10-15'),6))
>> score=rnorm(18, 10, 3)
>> df1<-data.frame(col1=col1, col2=col2, Date=dates, score=score)
> 
>     col1  col2       Date     score
> 1  happy alpha 2009-10-13  8.120639
> 2  happy alpha 2009-10-14 10.550930
> 3  happy alpha 2009-10-15  7.493114
> 4  happy  beta 2009-10-13 14.785842
> 5  happy  beta 2009-10-14 10.988523
> 6  happy  beta 2009-10-15  7.538595
> 7  happy gamma 2009-10-13 11.462287
> 8  happy gamma 2009-10-14 12.214974
> 9  happy gamma 2009-10-15 11.727344
> 10   sad alpha 2009-10-13  9.083835
> 11   sad alpha 2009-10-14 14.535344
> 12   sad alpha 2009-10-15 11.169530
> 13   sad  beta 2009-10-13  8.136278
> 14   sad  beta 2009-10-14  3.355900
> 15   sad  beta 2009-10-15 13.374793
> 16   sad gamma 2009-10-13  9.865199
> 17   sad gamma 2009-10-14  9.951429
> 18   sad gamma 2009-10-15 12.831509
> 
> 
> Is it possible to get the following, whereby I am averaging the values
> within each group of values in col2:
> 
>     col1  col2       Date     score   Average
> 1  happy alpha 13/10/2009  8.120639  8.721561
> 2  happy alpha 14/10/2009 10.550930  8.721561
> 3  happy alpha 15/10/2009  7.493114  8.721561
> 4  happy  beta 13/10/2009 14.785842 11.104320
> 5  happy  beta 14/10/2009 10.988523 11.104320
> 6  happy  beta 15/10/2009  7.538595 11.104320
> 7  happy gamma 13/10/2009 11.462287 11.801535
> 8  happy gamma 14/10/2009 12.214974 11.801535
> 9  happy gamma 15/10/2009 11.727344 11.801535
> 10   sad alpha 13/10/2009  9.083835 11.596236
> 11   sad alpha 14/10/2009 14.535344 11.596236
> 12   sad alpha 15/10/2009 11.169530 11.596236
> 13   sad  beta 13/10/2009  8.136278  8.288990
> 14   sad  beta 14/10/2009  3.355900  8.288990
> 15   sad  beta 15/10/2009 13.374793  8.288990
> 16   sad gamma 13/10/2009  9.865199 10.882712
> 17   sad gamma 14/10/2009  9.951429 10.882712
> 18   sad gamma 15/10/2009 12.831509 10.882712
> 
> 
> My feeling is that I should be using the ?aggregate is some fashion
> but can't see how to do it. Or possibly there's another function i
> should be using?

?ave

  For example, try something like this:

transform(df1, Average = ave(score, col1, col2))

> Thanks in advance,
> Tony
> 
> O/S: Windows Vista Ultimate
>> sessionInfo()
> R version 2.9.2 (2009-08-24)
> i386-pc-mingw32
> 
> locale:
> LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.
> 1252;LC_MONETARY=English_United Kingdom.
> 1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods
> base
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Chuck Cleland, Ph.D.
NDRI, Inc. (www.ndri.org)
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894




More information about the R-help mailing list