[R] aggregate / collapse big data frame efficiently

Patrick Burns pburns at pburns.seanet.com
Tue Dec 25 19:12:18 CET 2012


I'd suggest the 'data.table' package.  That is
one of the prime uses it was created for.

Pat

On 25/12/2012 16:34, Martin Batholdy wrote:
> Hi,
>
>
> I need to aggregate rows of a data.frame by computing the mean for rows with the same factor-level on one factor-variable;
>
> here is the sample code:
>
>
> x <- data.frame(rep(letters,2), rnorm(52), rnorm(52), rnorm(52))
>
> aggregate(x, list(x[,1]), mean)
>
>
> Now my problem is, that the actual data-set is much bigger (120 rows and approximately 100.000 columns) – and it takes very very long (actually at some point I just stopped it).
>
> Is there anything that can be done to make the aggregate routine more efficient?
> Or is there a different approach that would work faster?
>
>
> Thanks for any suggestions!
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Patrick Burns
pburns at pburns.seanet.com
twitter: @portfolioprobe
http://www.portfolioprobe.com/blog
http://www.burns-stat.com
(home of 'Some hints for the R beginner'
and 'The R Inferno')




More information about the R-help mailing list