[R] help with merging 2 data frames

Dimitri Liakhovitski dimitri.liakhovitski at gmail.com
Thu Jul 12 00:50:17 CEST 2012


Jorge, thank you!
that seems to be working, but unfortunately in real life I have
thousands of variables (except for a, a2, a3 and b) so that manually
selecting columns (as in c(2:4, 8:9)) would be too difficult...
Dimitri

On Wed, Jul 11, 2012 at 6:36 PM, Jorge I Velez <jorgeivanvelez at gmail.com> wrote:
> Hi Dimitri,
>
> Try creating a key for "x" and "y" and then merging the result by that
> variable:
>
> x$key <- with(x, paste(a, b, sep = "/"))
> y$key <- with(y, paste(a2, b, sep = "/"))
> merge(x, y, by = 'key')[, c(2:4, 8:9)]
>
> HTH,
> Jorge.-
>
>
> On Wed, Jul 11, 2012 at 6:28 PM, Dimitri Liakhovitski <> wrote:
>>
>> Dear R-ers,
>>
>> I feel I am close, but can't get it quite right.
>> Thanks a lot for your help!
>>
>> Dimitri
>>
>> # I have 2 data frames:
>>
>>
>> x<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80))
>>
>> y<-data.frame(a2=c("aa","aa","ba","ba"),a3=c("ab","ab","bb","bb"),b=c(1:2,1:2),e1=c(100,200,300,400),e2=c(101,201,301,401))
>> (x);(y)
>>
>> # I'd like to merge them so that the result looks like this:
>>
>>
>> desired<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80),
>>
>> e1=c(100,200,100,200,300,400,300,400),e2=c(101,201,101,201,301,401,301,401))
>> (desired)
>>
>> # In other words, I want column e1 and e2 entries from data frame y to
>> be repeated based on matching of column a from x and columns a2 and
>> then a3 from y.
>>
>> # I am trying step-by-step - first I am using column a2 from data
>> frame y for merging:
>> out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F)
>> (out1)   # looking good - half of the job is done
>>
>> # Step2 - does not work
>>
>> # next line produces columns e1 and e2 twice (in real life I have tons
>> of columns like e1 and e2):
>> merge(out1,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F)
>>
>> # next line also doesn't do the job:
>>
>> merge(out1,y[-1],by.x=c("a","b","e1","e2"),by.y=c("a3","b","e1","e2"),all.x=T,all.y=F)
>>
>> # Finally, I tried this approach:
>> out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F)
>> out2<-merge(x,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F)
>> (out1); (out2)
>>
>> # Now I need to merge these 2 - however, the next line doubles the
>> number of entries:
>> merge(out1,out2,by=names(out1),all.x=T,all.y=T)
>>
>> --
>> Dimitri Liakhovitski
>> marketfusionanalytics.com
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Dimitri Liakhovitski
marketfusionanalytics.com



More information about the R-help mailing list