[R] FUN argument to return a vector in aggregate function

Gabor Grothendieck ggrothendieck at gmail.com
Wed May 5 23:50:09 CEST 2010


Try this:

do.call("rbind", by(d, d[1:2], function(x) with(x, data.frame(x[1,
1:2], `mean c` = mean(c), `sum d` = sum(d), `has X` = "X" %in% e,
check.names = FALSE))))

or this (which  uses 1 or 0 to mean TRUE or FALSE in the last column):

> library(sqldf) # see http://sqldf.googlecode.com
> sqldf("select a, b, avg(c) 'mean c', sum(d) 'sum d', sum(e = 'X')>0 'has X' from d group by a, b", method = "raw")
  a b    mean c sum d has X
1 a 1 0.3333333     2     1
2 a 2 0.2500000     2     1
3 a 3 1.4000000     4     1
4 b 1 0.0000000     0     0
5 b 2 0.6666667     1     1
6 b 3 0.7500000     2     1

or this:

do.call("rbind", by(d, d[1:2], function(x) with(x, data.frame(x[1:2],
`mean c` = mean(c), `sum d` = sum(d), `has X` = X %in% e))


On Wed, May 5, 2010 at 5:32 PM, utkarshsinghal
<utkarsh.singhal at global-analytics.com> wrote:
> Extending my question further, I want to apply different FUN arguments on
> three fields and the "by" argument also contains more than one field.
> For example:
> set.seed(100)
> d =
> data.frame(a=sample(letters[1:2],20,replace=T),b=sample(3,20,replace=T),c=rpois(20,1),d=rbinom(20,1,0.5),e=rep(c("X","Y"),10))
>
> Now I want to split by fields "a" and "b", and want to calculate mean(c),
> sum(d) and "X"%in%e.
>
> Is there any function which can do this and return the output in a dataframe
> format. For the above example, it should ideally be a 6*5 dataframe.
>
> Thanks in advance.
>
> Regards,
> Utkarsh Singhal
>
>
>
> On 11/23/2009 5:14 AM, Gabor Grothendieck wrote:
>>
>> Try this:
>>
>>
>>>
>>> library(doBy)
>>> summaryBy(breaks ~ ., warpbreaks, FUN = c(mean, sum, length))
>>>
>>
>>   wool tension breaks.mean breaks.sum breaks.length
>> 1    A       L    44.55556        401             9
>> 2    A       M    24.00000        216             9
>> 3    A       H    24.55556        221             9
>> 4    B       L    28.22222        254             9
>> 5    B       M    28.77778        259             9
>> 6    B       H    18.77778        169             9
>>
>> On Mon, Nov 23, 2009 at 3:15 AM, utkarshsinghal
>> <utkarsh.singhal at global-analytics.com>  wrote:
>>
>>>
>>> Hi All,
>>>
>>> I am currently doing the following to compute summary statistics of
>>> aggregated data:
>>> a = aggregate(warpbreaks$breaks, warpbreaks[,-1], mean)
>>> b = aggregate(warpbreaks$breaks, warpbreaks[,-1], sum)
>>> c = aggregate(warpbreaks$breaks, warpbreaks[,-1], length)
>>> ans = cbind(a, b[,3], c[,3])
>>>
>>> This seems unnecessarily complex to me so I tried
>>>
>>>>
>>>> aggregate(warpbreaks$breaks, warpbreaks[,-1], function(z)
>>>> c(mean(z),sum(z),length(z)))
>>>>
>>>
>>> but aggregate doesn't allow FUN argument to return a vector.
>>>
>>> I tried "by", "tapply" and several other functions as well but the output
>>> needed further modifications to get the same format as "ans" above.
>>>
>>> Is there any other function same as aggregate which allow FUN argument to
>>> return vector.
>>>
>>> Regards
>>> Utkarsh
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>>
>



More information about the R-help mailing list