[R] Merging rows in dataframes

Schraga Schwartz schragas at post.tau.ac.il
Mon Mar 23 22:58:33 CET 2009


Hello,

I have a dataframe with 40 columns and around 450,000 rows. The first  
column in each row is a factor id and the remaining are numeric. Some  
rows have the same ids. What I want to do is to merge each set of rows  
sharing the same ids (id set) into one single row (summarizing row)  
with that id. To create the summarizing row, I'd like to apply a  
different function on each of the original columns in the id set. Some  
columns within the summarizing row will equal the mean of the columns  
in the id set, others will equal the minimum, others the maximum.

To do this, I tried using the by() function. However, this was  
extremely slow (it ran for more than two hours before I stopped it).  
Also, it used up all of 16 GB of memory on my machine. Is there any  
more efficient function, both in terms of time and memory, to do this  
sort of thing?

Thank you very much,
Schraga Schwartz




More information about the R-help mailing list