[R] Grouping data in a data frame: is there an efficient way to do it?

jim holtman jholtman at gmail.com
Thu Sep 3 01:13:20 CEST 2009


Take 0.6 seconds on my slow laptop:

> n <- 1e6
> x <- data.frame(a=sample(LETTERS, n, TRUE))
> system.time(print(tapply(x$a, x$a, length)))
    A     B     C     D     E     F     G     H     I     J     K
L     M     N     O     P     Q
38555 38349 38647 38271 38456 38352 38644 38679 38575 38730 38471
38379 38540 38413 38365 38501 38555
    R     S     T     U     V     W     X     Y     Z
38379 38417 38326 38509 38238 38395 38625 38175 38454
   user  system elapsed
   0.59    0.02    0.63
>




On Wed, Sep 2, 2009 at 6:39 PM, Leo Alekseyev<dnquark at gmail.com> wrote:
> I have a data frame with about 10^6 rows; I want to group the data
> according to entries in one of the columns and do something with it.
> For instance, suppose I want to count up the number of elements in
> each group.  I tried something like aggregate(my.df$my.field,
> list(my.df$my.field), length) but it seems to be very slow.  Likewise,
> the split() function was slow (I killed it before it completed).  Is
> there a way to efficiently accomplish this in R?..  I am almost
> tempted to write an external Perl/Python script entering every row
> into a hashtable keyed by my.field and iterating over the keys...
> Might this be faster?..
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?




More information about the R-help mailing list