[R] Similarity matching with probabilities

Hans-Joerg Bibiko bibiko at eva.mpg.de
Fri Jun 27 16:20:34 CEST 2008


On 27 Jun 2008, at 14:30, francogrex wrote:

>
> Hello,
> It's just a strange coincidence that someone posted just very  
> recently a
> question about matching. I know there are several match function in  
> the base
> package (such as match, pmatch, charmatch, and the gsub etc)  but I  
> can't
> seem to use them wisely to be able to get what I need.
> suppose I have the following strings:
> "tets"
> "estt"
> "rtes7"
> "gstes"
> "tes5t"
>
> Is there an R procedure to determine how related each string is to the
> reference string "test", for example to say that "tets" is similar  
> to "test"
> with a probability of 0.9 or something of that sort?

Have a look at ?agrep.
One could loop for different max.distances to get the relation.

An other way is to calculate the edit distance by Levenshtein(- 
Damerau). A starting point could be :

http://wiki.r-project.org/rwiki/doku.php?id=tips:data-strings:levenshtein

--Hans



More information about the R-help mailing list