[R] Is there a package that can do Fuzzy name matching to standardize names in a single column

Gregg Powell g@@@powe|| @end|ng |rom protonm@||@com
Wed Jun 15 16:57:37 CEST 2022


Have data sets where there are names, in the first column, client names in the second, and Client start date in the third. 

There are thousands of these records with thousands of names/clients/client start dates. The name is entered each time the person begins with a new client such that each person has many entries in the name column. Often the names were not entered in a consistent way. With and without middle initial, middle name, or various abbreviations such as ",RN" at the end of the name.

Is there a package that can do fuzzy name matching so that the names in name column get replaced with a "standardized" format - where some type of machine learning can pick the most common spelling of each repeat name and replace the different variations with the common spelling?

I included an example below. First table includes the names with the various spellings. Second table depicts what I hope to achieve.

Again - this is on a large scale - there are something like 10,000 records with names that need to be standardized.


Name

Client

Client Start Date

John Good

Client 1

1/1/2020

Joe Jackson

Client 2

6/1/2020

Bob A. Barker

Client 3

8/1/2020

John B. Good

Client 4

10/1/2020

Joe J. Jackson

Client 5

12/1/2020

Bob Allen Barker

Client 6

1/1/2021

John Good

Client 7

5/1/2021

Joe Jack Jackson

Client 8

8/1/2021

Bob Barker

Client 9

12/1/2021

 

 

 

Name

Client

Client Start Date

John Good

Client 1

1/1/2020

Joe J. Jackson

Client 2

6/1/2020

Bob A. Barker

Client 3

8/1/2020

John Good

Client 4

10/1/2020

Joe J. Jackson

Client 5

12/1/2020

Bob A. Barker

Client 6

1/1/2021

John Good

Client 7

5/1/2021

Joe J. Jackson

Client 8

8/1/2021

Bob A. Barker

Client 9

12/1/2021



THANKS!

Gregg Powell

Arizona, USA
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 509 bytes
Desc: OpenPGP digital signature
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20220615/86c9cb17/attachment.sig>


More information about the R-help mailing list