[R] Merge data frames but prefer values in one

JiHO jo.lists at gmail.com
Thu Sep 10 17:21:12 CEST 2009


Hello everyone,

My problem is better explained with an example:

 > x=data.frame(a=1:4,b=1:4,c=rnorm(4))
 > x
  a b          c
1 1 1 -0.8821089
2 2 2 -0.7082583
3 3 3 -0.5948835
4 4 4 -1.8571443
 > y=data.frame(a=c(1,3),b=3,c=rnorm(2))
 > y
  a b            c
1 1 3 -0.273155973
2 3 3  0.009517862

Now I want to merge x and y by columns a and b, hence creating a  
data.frame with all a:b combinations observed in x and y. That's  
easily done with merge:

 > merge(x,y,by=c("a","b"),all=T)
  a b        c.x          c.y
1 1 1 -0.8821089           NA
2 1 3         NA -0.273155973
3 2 2 -0.7082583           NA
4 3 3 -0.5948835  0.009517862
5 4 4 -1.8571443           NA

But rather than two c columns I would want the merge to:
- keep the value in x if there is no corresponding value in y
- keep the value in y if there is no corresponding value in x
- prefer the value in y when the a:b combination exists in both x and y

So basically I want my result to look like:
  a b          c
1 1 1 -0.8821089
2 1 3 -0.2731559
3 2 2 -0.7082583
4 3 3  0.0095178
5 4 4 -1.8571443

I can't find a combinations of options for merge that does this. Is  
there another fonction that would do that or do I have to resort to  
some post-processing after merge? It seems that it might be something  
like a "right merge" for data bases but I don't know this world at  
all. I would be happy to look into sqldf if that allows to do things  
like that.

Thanks in advance. Sincerely,

JiHO
---
http://maururu.net




More information about the R-help mailing list