[R] Is there a package that can do Fuzzy name matching to standardize names in a single column

Gregg Powell g@@@powe|| @end|ng |rom protonm@||@com
Wed Jun 15 17:43:14 CEST 2022


Hello Ashim and kind regards for you taking the time to answer back.


> library(fuzzyjoin)
> ?stringdist_left_join

-this will join two tables, but what I am trying to do is just standardize the similarly spelled duplicate names in just the first column of a single table.

I don't think fuzzyjoin will help me in that regard.

Thanks.
Gregg
Arizona, USA

------- Original Message -------
On Wednesday, June 15th, 2022 at 8:04 AM, Ashim Kapoor <ashimkapoor using gmail.com> wrote:


> 

> 

> Dear Gregg,
> 

> Check this out:
> 

> library(fuzzyjoin)
> ?stringdist_left_join
> 

> Best Regards,
> Ashim
> 

> On Wed, Jun 15, 2022 at 8:28 PM Gregg Powell via R-help
> r-help using r-project.org wrote:
> 

> > Have data sets where there are names, in the first column, client names in the second, and Client start date in the third.
> > 

> > There are thousands of these records with thousands of names/clients/client start dates. The name is entered each time the person begins with a new client such that each person has many entries in the name column. Often the names were not entered in a consistent way. With and without middle initial, middle name, or various abbreviations such as ",RN" at the end of the name.
> > 

> > Is there a package that can do fuzzy name matching so that the names in name column get replaced with a "standardized" format - where some type of machine learning can pick the most common spelling of each repeat name and replace the different variations with the common spelling?
> > 

> > I included an example below. First table includes the names with the various spellings. Second table depicts what I hope to achieve.
> > 

> > Again - this is on a large scale - there are something like 10,000 records with names that need to be standardized.
> > 

> > Name
> > 

> > Client
> > 

> > Client Start Date
> > 

> > John Good
> > 

> > Client 1
> > 

> > 1/1/2020
> > 

> > Joe Jackson
> > 

> > Client 2
> > 

> > 6/1/2020
> > 

> > Bob A. Barker
> > 

> > Client 3
> > 

> > 8/1/2020
> > 

> > John B. Good
> > 

> > Client 4
> > 

> > 10/1/2020
> > 

> > Joe J. Jackson
> > 

> > Client 5
> > 

> > 12/1/2020
> > 

> > Bob Allen Barker
> > 

> > Client 6
> > 

> > 1/1/2021
> > 

> > John Good
> > 

> > Client 7
> > 

> > 5/1/2021
> > 

> > Joe Jack Jackson
> > 

> > Client 8
> > 

> > 8/1/2021
> > 

> > Bob Barker
> > 

> > Client 9
> > 

> > 12/1/2021
> > 

> > Name
> > 

> > Client
> > 

> > Client Start Date
> > 

> > John Good
> > 

> > Client 1
> > 

> > 1/1/2020
> > 

> > Joe J. Jackson
> > 

> > Client 2
> > 

> > 6/1/2020
> > 

> > Bob A. Barker
> > 

> > Client 3
> > 

> > 8/1/2020
> > 

> > John Good
> > 

> > Client 4
> > 

> > 10/1/2020
> > 

> > Joe J. Jackson
> > 

> > Client 5
> > 

> > 12/1/2020
> > 

> > Bob A. Barker
> > 

> > Client 6
> > 

> > 1/1/2021
> > 

> > John Good
> > 

> > Client 7
> > 

> > 5/1/2021
> > 

> > Joe J. Jackson
> > 

> > Client 8
> > 

> > 8/1/2021
> > 

> > Bob A. Barker
> > 

> > Client 9
> > 

> > 12/1/2021
> > 

> > THANKS!
> > 

> > Gregg Powell
> > 

> > Arizona, USA______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 509 bytes
Desc: OpenPGP digital signature
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20220615/940a006c/attachment.sig>


More information about the R-help mailing list