[R] library/function to compare two phrases?

R. Michael Weylandt michael.weylandt at gmail.com
Sun Nov 18 00:20:44 CET 2012


On Sat, Nov 17, 2012 at 11:00 PM, Brian Feeny <bfeeny at mac.com> wrote:
> I am looking for a library/function in R that can compare two phrases and give me a score, or somehow classify them as correct as possible.
>
> The "phrases" are obfuscated/messy.  I am not concerned about which is "correct" (for example spell checking), I am only concerned in grouping them
> so that I know they are the closest match.
>
> Example:
>
> I have ROW1 and ROW2 like so:
>
> ROW1                                                    ROW2
> hamburger helper                                bigmc heartkcatta
> chicken nuggets                                 chicke, nuggets, jss
> bigmac heartattack                              some sombody somehwere
> somebody somehwere                      repleh regrubmah
>
> I am looking for something that can tell me that the best match for hamburger helper is repleh regrubmah, and the same for each other row.
>
> So my goal is to write a program that foreach phrase in ROW1 runs this function against ROW2 and gives me the phrase that scored best.
>
> I have read over much of the NLP packages at http://cran.r-project.org/web/views/NaturalLanguageProcessing.html
>
> I thought lsa might be a good fit, but I am not sure.  I have limited time, so I am hoping someone can point me in a direction of what I am looking for.
>
> I have been searching for "text classifiers", perhaps this problem is referred to as something else.
>

This is outside my expertise, but if memory serves, you might benefit
from googling the Levenshtein (spelling?) distance which allows this
sort of fuzzy matching of strings.

MW




More information about the R-help mailing list