[R] Merging rows in dataframes

Wed Mar 25 14:06:24 CET 2009

In the Links box to the right on the sqldf home page click on
"SQLite - aggregate functions" and lookup group_concat.

On Wed, Mar 25, 2009 at 9:05 AM, Schragi Schwartz
<schragas at post.tau.ac.il> wrote:
> Thank you, your answer was extremely helpful. One last problem though: one
> of the aggregate functions I'd like to apply on the columns is
> concatentation (equivalent to the paste() function). So if I have a given
> character column in three separate rows sharing the same ids with the value
> "apple" in the first, "banana" in the second, and "orange" in the third, in
> the summarizing row I'd like to receive output in the form
> "apple|banana|orange". Is there any way to do this?
>
> Thanks again,
> Schragi
>
> -----Original Message-----
> From: Gabor Grothendieck [mailto:ggrothendieck at gmail.com]
> Sent: Tuesday, March 24, 2009 12:50 AM
> To: Schraga Schwartz
> Cc: r-help at r-project.org
> Subject: Re: [R] Merging rows in dataframes
>
> Using sqldf you only need two statements, infile <- file(...) and
> DF <- sqldf("select min(a), max(b), mean(c), ... from infile group by id").
> The file statement identifies the filename and the second reads it
> into sqlite (without
> going through R), summarizes it and then reads the summarized version
> into R.  You may also need to provide info on its format if its not in the
> default format.  See example 4a on home page and the other examples
> there:
> http://sqldf.googlecode.com
>
>
> On Mon, Mar 23, 2009 at 5:58 PM, Schraga Schwartz
> <schragas at post.tau.ac.il> wrote:
>> Hello,
>>
>> I have a dataframe with 40 columns and around 450,000 rows. The first
> column
>> in each row is a factor id and the remaining are numeric. Some rows have
> the
>> same ids. What I want to do is to merge each set of rows sharing the same
>> ids (id set) into one single row (summarizing row) with that id. To create
>> the summarizing row, I'd like to apply a different function on each of the
>> original columns in the id set. Some columns within the summarizing row
> will
>> equal the mean of the columns in the id set, others will equal the
> minimum,
>> others the maximum.
>>
>> To do this, I tried using the by() function. However, this was extremely
>> slow (it ran for more than two hours before I stopped it). Also, it used
> up
>> all of 16 GB of memory on my machine. Is there any more efficient
> function,
>> both in terms of time and memory, to do this sort of thing?
>>
>> Thank you very much,
>> Schraga Schwartz
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>