[R] Merging rows in dataframes

Wed Mar 25 14:05:53 CET 2009

Thank you, your answer was extremely helpful. One last problem though: one
of the aggregate functions I'd like to apply on the columns is
concatentation (equivalent to the paste() function). So if I have a given
character column in three separate rows sharing the same ids with the value
"apple" in the first, "banana" in the second, and "orange" in the third, in
the summarizing row I'd like to receive output in the form
"apple|banana|orange". Is there any way to do this? 

Thanks again,
Schragi

-----Original Message-----
From: Gabor Grothendieck [mailto:ggrothendieck at gmail.com] 
Sent: Tuesday, March 24, 2009 12:50 AM
To: Schraga Schwartz
Cc: r-help at r-project.org
Subject: Re: [R] Merging rows in dataframes

Using sqldf you only need two statements, infile <- file(...) and
DF <- sqldf("select min(a), max(b), mean(c), ... from infile group by id").
The file statement identifies the filename and the second reads it
into sqlite (without
going through R), summarizes it and then reads the summarized version
into R.  You may also need to provide info on its format if its not in the
default format.  See example 4a on home page and the other examples
there:
http://sqldf.googlecode.com

On Mon, Mar 23, 2009 at 5:58 PM, Schraga Schwartz
<schragas at post.tau.ac.il> wrote:
> Hello,
>
> I have a dataframe with 40 columns and around 450,000 rows. The first
column
> in each row is a factor id and the remaining are numeric. Some rows have
the
> same ids. What I want to do is to merge each set of rows sharing the same
> ids (id set) into one single row (summarizing row) with that id. To create
> the summarizing row, I'd like to apply a different function on each of the
> original columns in the id set. Some columns within the summarizing row
will
> equal the mean of the columns in the id set, others will equal the
minimum,
> others the maximum.
>
> To do this, I tried using the by() function. However, this was extremely
> slow (it ran for more than two hours before I stopped it). Also, it used
up
> all of 16 GB of memory on my machine. Is there any more efficient
function,
> both in terms of time and memory, to do this sort of thing?
>
> Thank you very much,
> Schraga Schwartz
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>