[R] Is there a package that can do Fuzzy name matching to standardize names in a single column

Ashim Kapoor @@h|mk@poor @end|ng |rom gm@||@com
Wed Jun 15 17:04:52 CEST 2022


Dear Gregg,

Check this out:

library(fuzzyjoin)
?stringdist_left_join

Best Regards,
Ashim

On Wed, Jun 15, 2022 at 8:28 PM Gregg Powell via R-help
<r-help using r-project.org> wrote:
>
> Have data sets where there are names, in the first column, client names in the second, and Client start date in the third.
>
> There are thousands of these records with thousands of names/clients/client start dates. The name is entered each time the person begins with a new client such that each person has many entries in the name column. Often the names were not entered in a consistent way. With and without middle initial, middle name, or various abbreviations such as ",RN" at the end of the name.
>
> Is there a package that can do fuzzy name matching so that the names in name column get replaced with a "standardized" format - where some type of machine learning can pick the most common spelling of each repeat name and replace the different variations with the common spelling?
>
> I included an example below. First table includes the names with the various spellings. Second table depicts what I hope to achieve.
>
> Again - this is on a large scale - there are something like 10,000 records with names that need to be standardized.
>
>
> Name
>
> Client
>
> Client Start Date
>
> John Good
>
> Client 1
>
> 1/1/2020
>
> Joe Jackson
>
> Client 2
>
> 6/1/2020
>
> Bob A. Barker
>
> Client 3
>
> 8/1/2020
>
> John B. Good
>
> Client 4
>
> 10/1/2020
>
> Joe J. Jackson
>
> Client 5
>
> 12/1/2020
>
> Bob Allen Barker
>
> Client 6
>
> 1/1/2021
>
> John Good
>
> Client 7
>
> 5/1/2021
>
> Joe Jack Jackson
>
> Client 8
>
> 8/1/2021
>
> Bob Barker
>
> Client 9
>
> 12/1/2021
>
>
>
>
>
>
>
> Name
>
> Client
>
> Client Start Date
>
> John Good
>
> Client 1
>
> 1/1/2020
>
> Joe J. Jackson
>
> Client 2
>
> 6/1/2020
>
> Bob A. Barker
>
> Client 3
>
> 8/1/2020
>
> John Good
>
> Client 4
>
> 10/1/2020
>
> Joe J. Jackson
>
> Client 5
>
> 12/1/2020
>
> Bob A. Barker
>
> Client 6
>
> 1/1/2021
>
> John Good
>
> Client 7
>
> 5/1/2021
>
> Joe J. Jackson
>
> Client 8
>
> 8/1/2021
>
> Bob A. Barker
>
> Client 9
>
> 12/1/2021
>
>
>
> THANKS!
>
> Gregg Powell
>
> Arizona, USA______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list