[R] Need to compare two columns in two data.frames and return all rows from df where rows values are missing

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Tue Jun 15 04:50:03 CEST 2021


merge(..., all = TRUE) essentially does this for you. rows with NA's are
the non-matches:

merge(df1,df2,by = "name",all = TRUE)
   name    score.x   score.y
1     a 0.69280341        NA
2     b 0.04205953        NA
3     b 0.32792072        NA
4     b 0.89982497        NA
5     b 0.95450365        NA
6     b 0.99426978        NA
7     c 0.67757064 0.4659625
8     c 0.67757064 0.4137243
9     c 0.67757064 0.7584595
10    c 0.24608773 0.4659625
11    c 0.24608773 0.4137243
12    c 0.24608773 0.7584595
13    c 0.10292468 0.4659625
14    c 0.10292468 0.4137243
15    c 0.10292468 0.7584595
16    c 0.88953932 0.4659625
17    c 0.88953932 0.4137243
18    c 0.88953932 0.7584595
19    c 0.57263340 0.4659625
20    c 0.57263340 0.4137243
21    c 0.57263340 0.7584595
22    d 0.64050681 0.2316258
23    d 0.64050681 0.4145463
24    d 0.64050681 0.2330341
25    d 0.64050681 0.3688455
26    e         NA 0.1428000
27    e         NA 0.2164079
28    e         NA 0.1524447
29    f         NA 0.3181810
30    f         NA 0.1388061

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Jun 14, 2021 at 7:09 PM Jake Elmstedt <jake.elmstedt using gmail.com>
wrote:

> set.seed(123)
>
> df1 <- data.frame(name = sample(letters[1:4], 12, TRUE), score = runif(12))
> head(df1)
> #>   name      score
> #> 1    c 0.67757064
> #> 2    c 0.57263340
> #> 3    c 0.10292468
> #> 4    b 0.89982497
> #> 5    c 0.24608773
> #> 6    b 0.04205953
> table(df1[["name"]])
> #>
> #> a b c d
> #> 1 5 5 1
>
> df2 <- data.frame(name = sample(letters[3:6], 12, TRUE), score = runif(12))
> head(df2)
> #>   name     score
> #> 1    c 0.7584595
> #> 2    e 0.2164079
> #> 3    f 0.3181810
> #> 4    d 0.2316258
> #> 5    e 0.1428000
> #> 6    d 0.4145463
> table(df2[["name"]])
> #>
> #> c d e f
> #> 3 4 3 2
>
> df3 <- rbind(df1[!df1[["name"]] %in% df2[["name"]], ],
>              df2[!df2[["name"]] %in% df1[["name"]], ])
> head(df3)
> #>    name      score
> #> 4     b 0.89982497
> #> 6     b 0.04205953
> #> 7     b 0.32792072
> #> 8     b 0.95450365
> #> 10    a 0.69280341
> #> 12    b 0.99426978
> table(df3[["name"]])
> #>
> #> a b e f
> #> 1 5 3 2
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list