[R] Grouping data in a data frame: is there an efficient way to do it?

Leo Alekseyev dnquark at gmail.com
Thu Sep 3 00:39:40 CEST 2009


I have a data frame with about 10^6 rows; I want to group the data
according to entries in one of the columns and do something with it.
For instance, suppose I want to count up the number of elements in
each group.  I tried something like aggregate(my.df$my.field,
list(my.df$my.field), length) but it seems to be very slow.  Likewise,
the split() function was slow (I killed it before it completed).  Is
there a way to efficiently accomplish this in R?..  I am almost
tempted to write an external Perl/Python script entering every row
into a hashtable keyed by my.field and iterating over the keys...
Might this be faster?..




More information about the R-help mailing list