[R] merge multiple data frames

David M. Schruth dschruth at u.washington.edu
Wed Apr 11 00:43:25 CEST 2012


Sorry this is so late:

But you could try a "nerge" (from the 'caroline' package)

nerge(list(a,b,c))

Just have to make sure that the rows for each dataframe are renamed with 
the date columns.


On 1/30/2012 11:44 PM, Massimo Bressan wrote:
> thanks don
> I have here enough to study for a while....
> thank you for your help
> max
> ----- Original Message ----- From: "MacQueen, Don" <macqueen1 at llnl.gov>
> To: "Massimo Bressan" <mbressan at arpa.veneto.it>; <r-help at r-project.org>
> Sent: Monday, January 30, 2012 4:47 PM
> Subject: Re: [R] merge multiple data frames
>
>
>> Does this example help? It doesn't handle the problem of common field
>> names, but see below for another example.
>>
>> df1 <- data.frame(jn=1:4, a1=letters[1:4], a2=LETTERS[1:4])
>> df2 <- data.frame(jn=2:6, b1=month.abb[2:6])
>> df3 <- data.frame(jn=3:7, x=rnorm(5), y=13:17)
>>
>> dfn <- sqldf('select * from df1 left join df2 using (jn) left join df3
>> using (jn)')
>>
>> In this example, you automatically get all fields from all three data
>> frames, without having to name them in the SQL statement -- but you
>> should
>> not have common names.
>>
>>
>> To deal with common names, I myself would probably rename the
>> variables in
>> the data frames before trying to merge.
>>
>> A general method would be something like:
>> nms1 <- names(df1)
>> nms1[nms1 != 'date'] <- paste(nms1[nms1 != 'date'],'.1',sep='')
>> names(df1) <- nms1
>> Of course it has to be done for every data frame, but this can be put
>> in a
>> loop, if necessary.
>>
>>
>> However, here is an example where I have changed df1 and df2; they both
>> have a field named 'aa', in addition to the matching field.
>>
>> df1 <- data.frame(jn=1:4, aa=letters[1:4], a2=LETTERS[1:4])
>> df2 <- data.frame(jn=2:6, aa=month.abb[2:6])
>> df3 <- data.frame(jn=3:7, x=rnorm(5), y=13:17)
>>
>> dfn <- sqldf('select jn, df1.aa aa1, df2.aa aa2,
>> a2, x, y
>> from df1 left join df2 using (jn) left join df3 using (jn)')
>>
>> By the way, you can still select *, even with common names:
>>
>>
>> dfx <- sqldf('select * from df1 left join df2 using (jn) left join df3
>> using (jn)')but you might not like the result. Try it and see!
>>
>>
>>
>>
>> It's my understanding that in the current SQL definition 'as' is no
>> longer
>> required when changing field names (though it is also still allowed in
>> the
>> databases I work with, Oracle and MySQL). Perhaps sqldf does not allow
>> it.
>> I don't know.
>>
>> Hope this helps.
>>
>> -Don
>>
>>
>>
>> --
>> Don MacQueen
>>
>> Lawrence Livermore National Laboratory
>> 7000 East Ave., L-627
>> Livermore, CA 94550
>> 925-423-1062
>>
>>
>>
>>
>>
>> On 1/30/12 4:40 AM, "Massimo Bressan" <mbressan at arpa.veneto.it> wrote:
>>
>>> hi don
>>>
>>> I followed your advice about using sqldf package but the problem of
>>> labelling the fields persists;
>>> for some reasons I can not properly handle the sql 'as' statement....
>>>
>>> a_b<-sqldf("select a.*, b.* from a left join b on a.date=b.date")
>>> a_b_c<-sqldf("select a_b.*, c.* from a_b left join c on
>>> a_b.date=c.date")
>>>
>>> bye
>>>
>>> max
>>>
>>>
>>>
>>>
>>>
>>> ----- Original Message -----
>>> From: "MacQueen, Don" <macqueen1 at llnl.gov>
>>> To: "maxbre" <mbressan at arpa.veneto.it>; <r-help at r-project.org>
>>> Sent: Saturday, January 28, 2012 12:24 AM
>>> Subject: Re: [R] merge multiple data frames
>>>
>>>
>>> Not tested, but this might be a case for the sqldf package.
>>>
>>> -Don
>>>
>>> --
>>> Don MacQueen
>>>
>>> Lawrence Livermore National Laboratory
>>> 7000 East Ave., L-627
>>> Livermore, CA 94550
>>> 925-423-1062
>>>
>>>
>>>
>>>
>>>
>>> On 1/26/12 9:29 AM, "maxbre" <mbressan at arpa.veneto.it> wrote:
>>>
>>>> This is my reproducible example (three data frames: a, b, c)
>>>>
>>>> a<-structure(list(date = structure(1:6, .Label = c("2012-01-03",
>>>> "2012-01-04", "2012-01-05", "2012-01-06", "2012-01-07", "2012-01-08",
>>>> "2012-01-09", "2012-01-10", "2012-01-11", "2012-01-12", "2012-01-13",
>>>> "2012-01-14", "2012-01-15", "2012-01-16", "2012-01-17", "2012-01-18",
>>>> "2012-01-19", "2012-01-20", "2012-01-21", "2012-01-22", "2012-01-23"
>>>> ), class = "factor"), so2 = c(0.799401398190476, 0, 0,
>>>> 0.0100453950434783,
>>>> 0.200154920565217, 0.473866969181818), nox = c(111.716109973913,
>>>> 178.077239330435, 191.257829021739, 50.6799951473913, 115.284643540435,
>>>> 110.425185027727), no = c(48.8543691516522, 88.7197448817391,
>>>> 93.9931932472609, 13.9759949817391, 43.1395266865217, 41.7280296016364
>>>> ), no2 = c(36.8673432865217, 42.37150668, 47.53311701,
>>>> 29.3026882474783,
>>>> 49.2986070321739, 46.5978461731818), co = c(0.618856168125,
>>>> 0.996593475083333,
>>>> 0.666987416083333, 0.383437311166667, 0.281604928875, 0.155383408913043
>>>> ), o3 = c(12.1393100029167, 12.3522739816522, 10.9908791203043,
>>>> 26.9122200013043, 13.8421695947826, 12.3788847045455), ipa =
>>>> c(167.541954974667,
>>>> 252.7196257875, 231.802370709167, 83.4850259595833, 174.394613581667,
>>>> 173.868599272609), ws = c(1.47191016429167, 0.765781205208333,
>>>> 0.937053086791667, 1.581022406625, 0.909756802125, 0.959252831695652
>>>> ), wd = c(45.2650019737732, 28.2493544114369, 171.049080544214,
>>>> 319.753674830936, 33.8713897347193, 228.368119533759), temp =
>>>> c(7.91972825883333,
>>>> 3.79434291520833, 2.1287644735, 6.733854600625, 3.136579722,
>>>> 3.09864120704348), umr = c(86.11566638875, 94.5034087491667,
>>>> 94.14451249375, 53.1016709004167, 65.63420423, 74.955669236087
>>>> )), .Names = c("date", "so2", "nox", "no", "no2", "co", "o3",
>>>> "ipa", "ws", "wd", "temp", "umr"), row.names = c(NA, 6L), class =
>>>> "data.frame")
>>>>
>>>>
>>>> b<-structure(list(date = structure(1:6, .Label = c("2012-01-03",
>>>> "2012-01-04", "2012-01-05", "2012-01-06", "2012-01-07", "2012-01-08",
>>>> "2012-01-09", "2012-01-10", "2012-01-11", "2012-01-12", "2012-01-13",
>>>> "2012-01-14", "2012-01-15", "2012-01-16", "2012-01-17", "2012-01-18",
>>>> "2012-01-19", "2012-01-20", "2012-01-21", "2012-01-22", "2012-01-23"
>>>> ), class = "factor"), so2 = c(0, 0, 0, 0, 0, 0), nox = c(13.74758511,
>>>> 105.8060582, 61.22720599, 11.45280354, 56.86804174, 39.17917222
>>>> ), no = c(0.882593766, 48.97037506, 9.732937217, 1.794549972,
>>>> 16.32300019, 8.883637786), no2 = c(11.80447753, 25.35235381,
>>>> 28.72990261, 8.590004034, 31.9003796, 25.50512403), co = c(0.113954917,
>>>> 0.305985964, 0.064001839, 0, 1.86e-05, 0), o3 = c(5.570499897,
>>>> 9.802379608, 5.729360104, 11.91304016, 12.13407993, 10.00961971
>>>> ), ipa = c(6.065110207, 116.9079971, 93.21240234, 10.5777998,
>>>> 66.40740204, 34.47359848), ws = c(0.122115001, 0.367668003,
>>>> 0.494913995,
>>>> 0.627124012, 0.473895013, 0.593913019), wd = c(238.485119317031,
>>>> 221.645073036776, 220.372076815032, 237.868340917096, 209.532933617465,
>>>> 215.752030286564), temp = c(4.044159889, 1.176810026, 0.142934993,
>>>> 0.184606999, -0.935989976, -2.015399933), umr = c(72.29229736,
>>>> 88.69879913, 87.49530029, 24.00079918, 44.8852005, 49.47729874
>>>> )), .Names = c("date", "so2", "nox", "no", "no2", "co", "o3",
>>>> "ipa", "ws", "wd", "temp", "umr"), row.names = c(NA, 6L), class =
>>>> "data.frame")
>>>>
>>>>
>>>> c<-structure(list(date = structure(1:6, .Label = c("2012-01-03",
>>>> "2012-01-04", "2012-01-05", "2012-01-06", "2012-01-07", "2012-01-08",
>>>> "2012-01-09", "2012-01-10", "2012-01-11", "2012-01-12", "2012-01-13",
>>>> "2012-01-14", "2012-01-15", "2012-01-16", "2012-01-17", "2012-01-18",
>>>> "2012-01-19", "2012-01-20", "2012-01-21", "2012-01-22", "2012-01-23"
>>>> ), class = "factor"), so2 = c(2.617839247, 0, 0, 0.231044086,
>>>> 0.944608887, 2.12400444), nox = c(308.9046313, 275.6778849,
>>>> 390.0824142,
>>>> 178.7429364, 238.655832, 251.892601), no = c(156.0262489, 151.4412498,
>>>> 221.0725021, 65.96049786, 106.541748, 119.3471241), no2 =
>>>> c(74.80145447,
>>>> 59.29991481, 66.5897975, 77.84267978, 75.68422569, 85.43044816
>>>> ), co = c(1.628431197, 1.716231492, 1.264678366, 1.693460745,
>>>> 0.780637084, 0.892724398), o3 = c(26.1473999, 15.91584015, 22.46199989,
>>>> 37.39400101, 15.63426018, 17.51494026), ipa = c(538.414978,
>>>> 406.4620056,
>>>> 432.6459961, 275.2820129, 435.7909851, 436.8039856), ws =
>>>> c(4.995530128,
>>>> 1.355309963, 1.708899975, 3.131690025, 1.546270013, 1.571320057
>>>> ), wd = c(58.15639877, 64.5657153143848, 39.9754269501381,
>>>> 24.0739884380921,
>>>> 55.9453098437477, 56.7648829092446), temp = c(10.24740028, 7.052690029,
>>>> 4.33258009, 13.91609955, 8.762220383, 11.04300022), umr =
>>>> c(97.60900116,
>>>> 96.91899872, 96.20649719, 94.74620056, 82.04550171, 89.41320038
>>>> )), .Names = c("date", "so2", "nox", "no", "no2", "co", "o3",
>>>> "ipa", "ws", "wd", "temp", "umr"), row.names = c(NA, 6L), class =
>>>> "data.frame")
>>>>
>>>>
>>>> Given the three data frames ³a², ³b² and ³c², I need to merge them
>>>> all by
>>>> the common field ³date².
>>>> The following attempt is working fine butŠ
>>>>
>>>> # code start
>>>> all<-merge(a,b,by="date",suffixes=c(".a",".b"),all=T)
>>>> all<-merge(all,c,by="date",all=T)
>>>> # code end
>>>>
>>>> ŠI would like to possibly do it in just ³one shot² (a generalisation of
>>>> the
>>>> code for a more complex case of handling many data frames) also by
>>>> assigning
>>>> proper suffixes to each variable (not possible with the previous code
>>>> snippet)
>>>>
>>>> Then I also try a different approach with the use of the library
>>>> reshape
>>>> and
>>>> the function ³merge_all² butŠ
>>>>
>>>> # code start
>>>> library("reshape")
>>>> all.new<-merge_all(a, b, c, by="date", all=T,
>>>> suffixes=c(".a",".b",".c"))
>>>> # code end
>>>>
>>>> ŠI got the following error message:
>>>> error in merge in merge.data.frame(as.data.frame(x), as.data.frame(y),
>>>> ...)
>>>> :
>>>> formal argument "all" associated to different argument passed
>>>> (a free translation from italian)
>>>>
>>>> My question is:
>>>> how to accomplish the merging of multiple data frames with all the same
>>>> variable names and by a single id field with the need of ³keeping
>>>> track²
>>>> of
>>>> the original data frame by means of the use of suffixes appended to new
>>>> variables?
>>>>
>>>> Any help much appreciated
>>>>
>>>> Thank you
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://r.789695.n4.nabble.com/merge-multiple-data-frames-tp4331089p433108
>>>>
>>>> 9
>>>> .html
>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list