[R] merge data frames with same column names of different lengths and missing values

Domenico Vistocco vistocco at unicas.it
Sat Mar 7 11:24:56 CET 2009


Steven Lubitz wrote:
> Hello, I'm switching over from SAS to R and am having trouble merging data frames. The data frames have several columns with the same name, and each has a different number of rows. Some of the values are missing from cells with the same column names in each data frame. I had hoped that when I merged the dataframes, every column with the same name would be merged, with the value in a complete cell overwriting the value in an empty cell from the other data frame. I cannot seem to achieve this result, though I've tried several merge adaptations:
>
> x <- data.frame(item1=c(NA,NA,3,4,5), item2=c(1,NA,NA,4,5), id=1:5)
> y <- data.frame(item1=c(NA,2,NA,4,5,6), item2=c(NA,NA,3,4,5,NA), id=1:6)
>
>
> merge(x,y,by="id") #I lose observations here (n=1 in this example), and my items are duplicated - I do not want this result
>   id item1.x item2.x item1.y item2.y
> 1  1      NA       1      NA      NA
> 2  2      NA      NA       2      NA
> 3  3       3      NA      NA       3
> 4  4       4       4       4       4
> 5  5       5       5       5       5
>
>
> merge(x,y,by=c("id","item1","item2")) #again I lose observations (n=4 here) and do not want this result
>   id item1 item2
> 1  4     4     4
> 2  5     5     5
>
>
> merge(x,y,by=c("id","item1","item2"),all.x=T,all.y=T) #my rows are duplicated and the NA values are retained - I instead want one row per ID
>   id item1 item2
> 1  1    NA     1
> 2  1    NA    NA
> 3  2     2    NA
> 4  2    NA    NA
> 5  3     3    NA
> 6  3    NA     3
> 7  4     4     4
> 8  5     5     5
> 9  6     6    NA
>   
You should obtain the desired solution using:
merge(y, x, by=c("id","item1","item2"), all=TRUE)

In database terminology all=TRUE corresponds to the full outer join, 
all.x to the left outer join and all.y to the right outer join.

Ciao,
domenico

> In reality I have multiple data frames with numerous columns, all with this problem. I can do the merge seamlessly in SAS, but am trying to learn and stick with R for my analyses. Any help would be greatly appreciated.
>
> Steve Lubitz
> Cardiovascular Research Fellow, Brigham and Women's Hospital and Massachusetts General Hospital
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>




More information about the R-help mailing list