[R] aggregate slow with many rows - alternative?

Gabor Grothendieck ggrothendieck at gmail.com
Fri Oct 14 02:29:18 CEST 2005


Convert dat to a matrix and see if working with the
matrix instead of a data frame speeds things up
enough.

On 10/13/05, Hans-Peter <gchappi at gmail.com> wrote:
> Hi,
>
> I use the code below to aggregate / cnt my test data. It works fine,
> but the problem is with my real data (33'000 rows) where the function
> is really slow (nothing happened in half an hour).
>
> Does anybody know of other functions that I could use?
>
> Thanks,
> Hans-Peter
>
> --------------
> dat <- data.frame( Datum  = c( 32586, 32587, 32587, 32625, 32656,
> 32656, 32656, 32672, 32672, 32699 ),
>              FischerID = c( 58395, 58395, 58395, 88434, 89953, 89953,
> 89953, 64395, 62896, 62870 ),
>              Anzahl = c( 2, 2, 1, 1, 2, 1, 7, 1, 1, 2 ) )
> f <- function(x) data.frame( Datum = x[1,1], FischerID = x[1,2],
> Anzahl = sum( x[,3] ), Cnt = dim( x )[1] )
> t.a <- do.call("rbind", by(dat, dat[,1:2], f))   # slow for 33'000 rows
> t.a <- t.a[order( t.a[,1], t.a[,2] ),]
>
>  # show data
> dat
> t.a
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>




More information about the R-help mailing list