[R] help with merging 2 data frames

Dimitri Liakhovitski dimitri.liakhovitski at gmail.com
Thu Jul 12 00:28:51 CEST 2012


Dear R-ers,

I feel I am close, but can't get it quite right.
Thanks a lot for your help!

Dimitri

# I have 2 data frames:

x<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80))
y<-data.frame(a2=c("aa","aa","ba","ba"),a3=c("ab","ab","bb","bb"),b=c(1:2,1:2),e1=c(100,200,300,400),e2=c(101,201,301,401))
(x);(y)

# I'd like to merge them so that the result looks like this:

desired<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80),
	e1=c(100,200,100,200,300,400,300,400),e2=c(101,201,101,201,301,401,301,401))
(desired)

# In other words, I want column e1 and e2 entries from data frame y to
be repeated based on matching of column a from x and columns a2 and
then a3 from y.

# I am trying step-by-step - first I am using column a2 from data
frame y for merging:
out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F)
(out1)   # looking good - half of the job is done

# Step2 - does not work

# next line produces columns e1 and e2 twice (in real life I have tons
of columns like e1 and e2):
merge(out1,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F)

# next line also doesn't do the job:
merge(out1,y[-1],by.x=c("a","b","e1","e2"),by.y=c("a3","b","e1","e2"),all.x=T,all.y=F)

# Finally, I tried this approach:
out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F)
out2<-merge(x,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F)
(out1); (out2)

# Now I need to merge these 2 - however, the next line doubles the
number of entries:
merge(out1,out2,by=names(out1),all.x=T,all.y=T)

-- 
Dimitri Liakhovitski
marketfusionanalytics.com



More information about the R-help mailing list