[R] Merging rows in dataframes

Gabor Grothendieck ggrothendieck at gmail.com
Sun Mar 29 14:58:59 CEST 2009


Example 7c has been aded to the home page which illustrates how
to do this using nested selects.
http://sqldf.googlecode.com

2009/3/29 Schragi Schwartz <schragas at post.tau.ac.il>:
> Hi,
> Again, thank you for your extremely helpful answer. I'll try my luck a third
> time with a similar but different question: assuming I have two dataframes,
> a and b, each of which containing a set of start and end coordinates, and I
> want to create a new dataframe c, which contains all coordinates in a which
> are are overlapped by any coordinate in b. Can I do this via sqldf()? Or is
> there any other way to do this?
>
> Thank you very much!
> Schragi
>
> -----Original Message-----
> From: Gabor Grothendieck [mailto:ggrothendieck at gmail.com]
> Sent: Wednesday, March 25, 2009 3:44 PM
> To: Schragi Schwartz
> Cc: r-help at r-project.org
> Subject: Re: [R] Merging rows in dataframes
>
> I've added an example to FAQ 3 on the home page that
> illustrates group_concat.
> http://sqldf.googlecode.com
>
> On Wed, Mar 25, 2009 at 9:06 AM, Gabor Grothendieck
> <ggrothendieck at gmail.com> wrote:
>> In the Links box to the right on the sqldf home page click on
>> "SQLite - aggregate functions" and lookup group_concat.
>>
>> On Wed, Mar 25, 2009 at 9:05 AM, Schragi Schwartz
>> <schragas at post.tau.ac.il> wrote:
>>> Thank you, your answer was extremely helpful. One last problem though:
> one
>>> of the aggregate functions I'd like to apply on the columns is
>>> concatentation (equivalent to the paste() function). So if I have a given
>>> character column in three separate rows sharing the same ids with the
> value
>>> "apple" in the first, "banana" in the second, and "orange" in the third,
> in
>>> the summarizing row I'd like to receive output in the form
>>> "apple|banana|orange". Is there any way to do this?
>>>
>>> Thanks again,
>>> Schragi
>>>
>>> -----Original Message-----
>>> From: Gabor Grothendieck [mailto:ggrothendieck at gmail.com]
>>> Sent: Tuesday, March 24, 2009 12:50 AM
>>> To: Schraga Schwartz
>>> Cc: r-help at r-project.org
>>> Subject: Re: [R] Merging rows in dataframes
>>>
>>> Using sqldf you only need two statements, infile <- file(...) and
>>> DF <- sqldf("select min(a), max(b), mean(c), ... from infile group by
> id").
>>> The file statement identifies the filename and the second reads it
>>> into sqlite (without
>>> going through R), summarizes it and then reads the summarized version
>>> into R.  You may also need to provide info on its format if its not in
> the
>>> default format.  See example 4a on home page and the other examples
>>> there:
>>> http://sqldf.googlecode.com
>>>
>>>
>>> On Mon, Mar 23, 2009 at 5:58 PM, Schraga Schwartz
>>> <schragas at post.tau.ac.il> wrote:
>>>> Hello,
>>>>
>>>> I have a dataframe with 40 columns and around 450,000 rows. The first
>>> column
>>>> in each row is a factor id and the remaining are numeric. Some rows have
>>> the
>>>> same ids. What I want to do is to merge each set of rows sharing the
> same
>>>> ids (id set) into one single row (summarizing row) with that id. To
> create
>>>> the summarizing row, I'd like to apply a different function on each of
> the
>>>> original columns in the id set. Some columns within the summarizing row
>>> will
>>>> equal the mean of the columns in the id set, others will equal the
>>> minimum,
>>>> others the maximum.
>>>>
>>>> To do this, I tried using the by() function. However, this was extremely
>>>> slow (it ran for more than two hours before I stopped it). Also, it used
>>> up
>>>> all of 16 GB of memory on my machine. Is there any more efficient
>>> function,
>>>> both in terms of time and memory, to do this sort of thing?
>>>>
>>>> Thank you very much,
>>>> Schraga Schwartz
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>
>
>




More information about the R-help mailing list