[R] merge multiple data frames

Massimo Bressan mbressan at arpa.veneto.it
Fri Jan 27 09:19:58 CET 2012


I tested your code: it's OK but there is still the problem of the suffixes 
for the last dataframe....
thank you for the support


----- Original Message ----- 
From: "R. Michael Weylandt" <michael.weylandt at gmail.com>
To: "maxbre" <mbressan at arpa.veneto.it>
Cc: <r-help at r-project.org>
Sent: Thursday, January 26, 2012 8:19 PM
Subject: Re: [R] merge multiple data frames


I might do something like this:

mergeAll <- function(..., by = "date", all = TRUE) {
  dotArgs <- list(...)
  Reduce(function(x, y)
  merge(x, y, by = by, all = all, suffixes=paste(".", names(dotArgs),
sep = "")),
  dotArgs)}

mergeAll(a = a, b = b, c = c)

str(.Last.value)

You also might be able to set it up to capture names without you
having to put "a = a" etc. using substitute.

On Thu, Jan 26, 2012 at 12:29 PM, maxbre <mbressan at arpa.veneto.it> wrote:
> This is my reproducible example (three data frames: a, b, c)
>
> a<-structure(list(date = structure(1:6, .Label = c("2012-01-03",
> "2012-01-04", "2012-01-05", "2012-01-06", "2012-01-07", "2012-01-08",
> "2012-01-09", "2012-01-10", "2012-01-11", "2012-01-12", "2012-01-13",
> "2012-01-14", "2012-01-15", "2012-01-16", "2012-01-17", "2012-01-18",
> "2012-01-19", "2012-01-20", "2012-01-21", "2012-01-22", "2012-01-23"
> ), class = "factor"), so2 = c(0.799401398190476, 0, 0, 0.0100453950434783,
> 0.200154920565217, 0.473866969181818), nox = c(111.716109973913,
> 178.077239330435, 191.257829021739, 50.6799951473913, 115.284643540435,
> 110.425185027727), no = c(48.8543691516522, 88.7197448817391,
> 93.9931932472609, 13.9759949817391, 43.1395266865217, 41.7280296016364
> ), no2 = c(36.8673432865217, 42.37150668, 47.53311701, 29.3026882474783,
> 49.2986070321739, 46.5978461731818), co = c(0.618856168125,
> 0.996593475083333,
> 0.666987416083333, 0.383437311166667, 0.281604928875, 0.155383408913043
> ), o3 = c(12.1393100029167, 12.3522739816522, 10.9908791203043,
> 26.9122200013043, 13.8421695947826, 12.3788847045455), ipa =
> c(167.541954974667,
> 252.7196257875, 231.802370709167, 83.4850259595833, 174.394613581667,
> 173.868599272609), ws = c(1.47191016429167, 0.765781205208333,
> 0.937053086791667, 1.581022406625, 0.909756802125, 0.959252831695652
> ), wd = c(45.2650019737732, 28.2493544114369, 171.049080544214,
> 319.753674830936, 33.8713897347193, 228.368119533759), temp =
> c(7.91972825883333,
> 3.79434291520833, 2.1287644735, 6.733854600625, 3.136579722,
> 3.09864120704348), umr = c(86.11566638875, 94.5034087491667,
> 94.14451249375, 53.1016709004167, 65.63420423, 74.955669236087
> )), .Names = c("date", "so2", "nox", "no", "no2", "co", "o3",
> "ipa", "ws", "wd", "temp", "umr"), row.names = c(NA, 6L), class =
> "data.frame")
>
>
> b<-structure(list(date = structure(1:6, .Label = c("2012-01-03",
> "2012-01-04", "2012-01-05", "2012-01-06", "2012-01-07", "2012-01-08",
> "2012-01-09", "2012-01-10", "2012-01-11", "2012-01-12", "2012-01-13",
> "2012-01-14", "2012-01-15", "2012-01-16", "2012-01-17", "2012-01-18",
> "2012-01-19", "2012-01-20", "2012-01-21", "2012-01-22", "2012-01-23"
> ), class = "factor"), so2 = c(0, 0, 0, 0, 0, 0), nox = c(13.74758511,
> 105.8060582, 61.22720599, 11.45280354, 56.86804174, 39.17917222
> ), no = c(0.882593766, 48.97037506, 9.732937217, 1.794549972,
> 16.32300019, 8.883637786), no2 = c(11.80447753, 25.35235381,
> 28.72990261, 8.590004034, 31.9003796, 25.50512403), co = c(0.113954917,
> 0.305985964, 0.064001839, 0, 1.86e-05, 0), o3 = c(5.570499897,
> 9.802379608, 5.729360104, 11.91304016, 12.13407993, 10.00961971
> ), ipa = c(6.065110207, 116.9079971, 93.21240234, 10.5777998,
> 66.40740204, 34.47359848), ws = c(0.122115001, 0.367668003, 0.494913995,
> 0.627124012, 0.473895013, 0.593913019), wd = c(238.485119317031,
> 221.645073036776, 220.372076815032, 237.868340917096, 209.532933617465,
> 215.752030286564), temp = c(4.044159889, 1.176810026, 0.142934993,
> 0.184606999, -0.935989976, -2.015399933), umr = c(72.29229736,
> 88.69879913, 87.49530029, 24.00079918, 44.8852005, 49.47729874
> )), .Names = c("date", "so2", "nox", "no", "no2", "co", "o3",
> "ipa", "ws", "wd", "temp", "umr"), row.names = c(NA, 6L), class =
> "data.frame")
>
>
> c<-structure(list(date = structure(1:6, .Label = c("2012-01-03",
> "2012-01-04", "2012-01-05", "2012-01-06", "2012-01-07", "2012-01-08",
> "2012-01-09", "2012-01-10", "2012-01-11", "2012-01-12", "2012-01-13",
> "2012-01-14", "2012-01-15", "2012-01-16", "2012-01-17", "2012-01-18",
> "2012-01-19", "2012-01-20", "2012-01-21", "2012-01-22", "2012-01-23"
> ), class = "factor"), so2 = c(2.617839247, 0, 0, 0.231044086,
> 0.944608887, 2.12400444), nox = c(308.9046313, 275.6778849, 390.0824142,
> 178.7429364, 238.655832, 251.892601), no = c(156.0262489, 151.4412498,
> 221.0725021, 65.96049786, 106.541748, 119.3471241), no2 = c(74.80145447,
> 59.29991481, 66.5897975, 77.84267978, 75.68422569, 85.43044816
> ), co = c(1.628431197, 1.716231492, 1.264678366, 1.693460745,
> 0.780637084, 0.892724398), o3 = c(26.1473999, 15.91584015, 22.46199989,
> 37.39400101, 15.63426018, 17.51494026), ipa = c(538.414978, 406.4620056,
> 432.6459961, 275.2820129, 435.7909851, 436.8039856), ws = c(4.995530128,
> 1.355309963, 1.708899975, 3.131690025, 1.546270013, 1.571320057
> ), wd = c(58.15639877, 64.5657153143848, 39.9754269501381, 
> 24.0739884380921,
> 55.9453098437477, 56.7648829092446), temp = c(10.24740028, 7.052690029,
> 4.33258009, 13.91609955, 8.762220383, 11.04300022), umr = c(97.60900116,
> 96.91899872, 96.20649719, 94.74620056, 82.04550171, 89.41320038
> )), .Names = c("date", "so2", "nox", "no", "no2", "co", "o3",
> "ipa", "ws", "wd", "temp", "umr"), row.names = c(NA, 6L), class =
> "data.frame")
>
>
> Given the three data frames “a”, “b” and “c”, I need to merge them all by
> the common field “date”.
> The following attempt is working fine but…
>
> # code start
> all<-merge(a,b,by="date",suffixes=c(".a",".b"),all=T)
> all<-merge(all,c,by="date",all=T)
> # code end
>
> …I would like to possibly do it in just “one shot” (a generalisation of 
> the
> code for a more complex case of handling many data frames) also by 
> assigning
> proper suffixes to each variable (not possible with the previous code
> snippet)
>
> Then I also try a different approach with the use of the library reshape 
> and
> the function “merge_all” but…
>
> # code start
> library("reshape")
> all.new<-merge_all(a, b, c, by="date", all=T, suffixes=c(".a",".b",".c"))
> # code end
>
> …I got the following error message:
> error in merge in merge.data.frame(as.data.frame(x), as.data.frame(y), 
> ...)
> :
> formal argument "all" associated to different argument passed
> (a free translation from italian)
>
> My question is:
> how to accomplish the merging of multiple data frames with all the same
> variable names and by a single id field with the need of “keeping track” 
> of
> the original data frame by means of the use of suffixes appended to new
> variables?
>
> Any help much appreciated
>
> Thank you
>
>
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/merge-multiple-data-frames-tp4331089p4331089.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list