[R] aggregate vs tapply; is there a middle ground?

Mon Feb 13 08:52:08 CET 2006

Thanks Peter!

I had a "feeling" that there must be a simpler, better, more elegant 
solution.

/Hans

Peter Dalgaard wrote:
> hadley wickham <h.wickham at gmail.com> writes:
>
>   
>>> I faced a similar problem. Here's what I did
>>>
>>> tmp <-
>>> data.frame(A=sample(LETTERS[1:5],10,replace=T),B=sample(letters[1:5],10,replace=T),C=rnorm(10))
>>> tmp1 <- with(tmp,aggregate(C,list(A=A,B=B),sum))
>>> tmp2 <- expand.grid(A=sort(unique(tmp$A)),B=sort(unique(tmp$B)))
>>> merge(tmp2,tmp1,all.x=T)
>>>
>>> At least fewer than 10 extra lines of code. Anyone with a simpler solution?
>>>       
>> Well, you can almost do this in with the reshape package:
>>
>> tmp <-
>> data.frame(A=sample(LETTERS[1:5],10,replace=T),B=sample(letters[1:5],10,replace=T),C=rnorm(10))
>> a <- recast(tmp, A + B ~ ., sum)
>> # see also recast(tmp, A  ~ B, sum)
>> add.all.combinations(a, row="A", cols = "B")
>>
>> Where add.all.combinations basically does what you outlined above --
>> it would be easy enough to generalise to multiple dimensions.
>>     
>
> Anything wrong with
>
>   
>> as.data.frame(with(tmp,as.table(tapply(C,list(A=A,B=B),sum))))
>>     
>    A B       Freq
> 1  A a         NA
> 2  B a -0.2524320
> 3  C a  3.8539264
> 4  D a         NA
> 5  A c  0.7227294
> 6  B c -0.2694669
> 7  C c  0.4760957
> 8  D c         NA
> 9  A e         NA
> 10 B e  0.1800500
> 11 C e         NA
> 12 D e -1.0350928
>
> (except the silly colname, responseName="sum" should fix that).
>
>   

-- 

*********************************
Hans Gardfjell
Ecology and Environmental Science
Umeå University
90187 Umeå, Sweden
email: hans.gardfjell at emg.umu.se
phone:  +46 907865267
mobile: +46 705984464