[R] How to use compare.linkage in RecordLinkage package? -- more details but problem remains

Anders Alexandersson andersalex at gmail.com
Thu Jan 28 21:01:52 CET 2016


How does one link two datasets using the compare.linkage function in the
RecordLinkage package? This is to follow-up on my original posting earlier
today:
https://stat.ethz.ch/pipermail/r-help/2016-January/435736.html

I suggested then that I should perhaps have added the identity argument.
But if I add the identity argument, then I unexpectedly get 5 matches,
47885 non-matches and 0 pairs with unknown status. For example, I get a
match for row 4256 which is unexpected because the matching variable bm
does not match -- is 0 in the result pair (because bm is 1 for BERND JUNG
and 4 for BERND MUELLER). Also, is_match in row 1 changes from unknown (NA)
to no match (0) which is unexpected since the matching variable bm matches
(bm=1).

Here are the major new R commands that I ran and the output:
> rpairs <- compare.linkage(RLdata500,RLdata10000,blockfld=c(1),

identity1=identity.RLdata500,identity2=identity.RLdata10000,exclude=c(2:5,7))
> subset(rpairs$pairs, is_match=="1") # Why these 5 matches?
      id1  id2 fname_c1 bm is_match
4256   59 1394        1  0        1
5811  174 3684        1  0        1
14699 139 4199        1  0        1
16453  92 4580        1  0        1
21840  73  737        1  0        1
> RLdata500[c(17, 59), ] # first obs, and first matching obs
    fname_c1 fname_c2 lname_c1 lname_c2   by bm bd
17 ALEXANDER     <NA>  MUELLER     <NA> 1974  9  9
59     BERND     <NA>     JUNG    KLEIN 1935  1 14
> RLdata10000[c(343, 1394), ] # first obs, and first matching obs
      fname_c1 fname_c2 lname_c1 lname_c2   by bm bd
343  ALEXANDER     <NA>  BAUMANN     <NA> 1957  9  7
1394     BERND     <NA>  MUELLER     <NA> 1942  4  4
> rpairs$pairs[1:2, ]; # list first 2 obs
  id1  id2 fname_c1 bm is_match
1  17  343        1  1        0
2  17 2385        1  0        0

What am I missing? How to probabilistically link two datasets using the
compare.linkage function in the RecordLinkage package?

Anders Alexandersson
andersalex at gmail.com

	[[alternative HTML version deleted]]



More information about the R-help mailing list