[R] merging or joining 2 dataframes: merge, rbind.fill, etc.?

Jeff Newmiller jdnewmil at dcn.davis.CA.us
Wed Feb 27 03:59:08 CET 2013


merge and rbind have very different memory usage profiles. There are some optimizations you can take advantage of if you store all of your small data frames in a list first, and then feed it through sapply (base) or ldply (plyr) to form the large data frame all at once, which can avoid the memory fragmentation associated with incrementally appending the data.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

Anika Masters <anika.masters at gmail.com> wrote:

>Thanks Arun and David.  Another issue I am running into are memory
>issues when one of the data frames I'm trying to rbind to or merge
>with are "very large".  (This is a repetitive  problem, as I am trying
>to merge/rbind thousands of small dataframes into a single "very
>large" dataframe.)
>
>
>
>I'm thinking of creating a function that creates an empty dataframe to
>which I can add data, but will need to first determine and ensure that
>each dataframe has the exact same columns, in the exact same
>"location".
>
>
>
>Before I write any new code, is there any pre-existing functions or
>code that might solve this problem of "merging small or medium sized
>dataframes with a "very large" dataframe.)
>
>On Tue, Feb 26, 2013 at 2:00 PM, David L Carlson <dcarlson at tamu.edu>
>wrote:
>> Clumsy but it doesn't require any packages:
>>
>> merge2 <- function(x, y) {
>> if(all(union(names(x), names(y)) == intersect(names(x), names(y)))){
>>     rbind(x, y)
>>     } else merge(x, y, all=TRUE)
>> }
>> merge2(df1, df2)
>> df3 <- df1
>> merge2(df1, df3)
>>
>> ----------------------------------------------
>> David L Carlson
>> Associate Professor of Anthropology
>> Texas A&M University
>> College Station, TX 77843-4352
>>
>>
>>
>>
>>> -----Original Message-----
>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>>> project.org] On Behalf Of arun
>>> Sent: Tuesday, February 26, 2013 1:14 PM
>>> To: Anika Masters
>>> Cc: R help
>>> Subject: Re: [R] merging or joining 2 dataframes: merge, rbind.fill,
>>> etc.?
>>>
>>> Hi,
>>>
>>> You could also try:
>>> library(gtools)
>>> smartbind(df2,df1)
>>> #  a  b  d
>>> #1 7 99 12
>>> #2 7 99 12
>>>
>>>
>>> When df1!=df2
>>> smartbind(df1,df2)
>>> #   a  b  d  x  y  c
>>> #1  7 99 12 NA NA NA
>>> #2 NA 34 88 12 44 56
>>> A.K.
>>>
>>>
>>>
>>>
>>> ----- Original Message -----
>>> From: Anika Masters <anika.masters at gmail.com>
>>> To: r-help at r-project.org
>>> Cc:
>>> Sent: Tuesday, February 26, 2013 1:55 PM
>>> Subject: [R] merging or joining 2 dataframes: merge, rbind.fill,
>etc.?
>>>
>>> #I want to "merge" or "join" 2 dataframes (df1 & df2) into a 3rd
>>> (mydf).  I want the 3rd dataframe to contain 1 row for each row in
>df1
>>> & df2, and all the columns in both df1 & df2. The solution should
>>> "work" even if the 2 dataframes are identical, and even if the 2
>>> dataframes do not have the same column names.  The rbind.fill
>function
>>> seems to work.  For learning purposes, are there other "good" ways
>to
>>> solve this problem, using merge or other functions other than
>>> rbind.fill?
>>>
>>> #e.g. These 3 examples all seem to "work" correctly and as I hoped:
>>>
>>> df1 <- data.frame(matrix(data=c(7, 99, 12) ,  nrow=1 ,  dimnames =
>>> list( NULL ,  c('a' , 'b' , 'd') ) ) )
>>> df2 <- data.frame(matrix(data=c(88, 34, 12, 44, 56) ,  nrow=1 ,
>>> dimnames = list( NULL ,  c('d' , 'b' , 'x' ,  'y', 'c') ) ) )
>>> mydf <- merge(df2, df1, all.y=T, all.x=T)
>>> mydf
>>>
>>> #e.g. this works:
>>> library(reshape)
>>> mydf <- rbind.fill(df1, df2)
>>> mydf
>>>
>>> #This works:
>>> library(reshape)
>>> mydf <- rbind.fill(df1, df2)
>>> mydf
>>>
>>> #But this does not (the 2 dataframes are identical)
>>> df1 <- data.frame(matrix(data=c(7, 99, 12) ,  nrow=1 ,  dimnames =
>>> list( NULL ,  c('a' , 'b' , 'd') ) ) )
>>> df2 <- df1
>>> mydf <- merge(df2, df1, all.y=T, all.x=T)
>>> mydf
>>>
>>> #Any way to get "mere" to work for this final example? Any other
>good
>>> solutions?
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-
>>> guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-
>>> guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list