[R] Merging rows in dataframes

Gabor Grothendieck ggrothendieck at gmail.com
Mon Mar 23 23:50:09 CET 2009


Using sqldf you only need two statements, infile <- file(...) and
DF <- sqldf("select min(a), max(b), mean(c), ... from infile group by id").
The file statement identifies the filename and the second reads it
into sqlite (without
going through R), summarizes it and then reads the summarized version
into R.  You may also need to provide info on its format if its not in the
default format.  See example 4a on home page and the other examples
there:
http://sqldf.googlecode.com


On Mon, Mar 23, 2009 at 5:58 PM, Schraga Schwartz
<schragas at post.tau.ac.il> wrote:
> Hello,
>
> I have a dataframe with 40 columns and around 450,000 rows. The first column
> in each row is a factor id and the remaining are numeric. Some rows have the
> same ids. What I want to do is to merge each set of rows sharing the same
> ids (id set) into one single row (summarizing row) with that id. To create
> the summarizing row, I'd like to apply a different function on each of the
> original columns in the id set. Some columns within the summarizing row will
> equal the mean of the columns in the id set, others will equal the minimum,
> others the maximum.
>
> To do this, I tried using the by() function. However, this was extremely
> slow (it ran for more than two hours before I stopped it). Also, it used up
> all of 16 GB of memory on my machine. Is there any more efficient function,
> both in terms of time and memory, to do this sort of thing?
>
> Thank you very much,
> Schraga Schwartz
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list