[R] Compare String Similarity

Prof. Dr. Matthias Kohl Matthias.Kohl at stamats.de
Thu Apr 19 19:36:29 CEST 2012


you should also look at Bioconductor Package Biostrings
hth
Matthias

Am 19.04.2012 18:01, schrieb R. Michael Weylandt:
> Though if you do decide to use Levenstein, it's implemented here in R:
> http://finzi.psych.upenn.edu/R/library/RecordLinkage/html/strcmp.html
>
> I'm pretty sure this is in C code so it should be mighty fast.
>
> Michael
>
> On Thu, Apr 19, 2012 at 11:40 AM, Bert Gunter<gunter.berton at gene.com>  wrote:
>> Wrong list.This is R, not statistics (or linguistics?).Please post elsewhere.
>>
>> -- Bert
>>
>> On Thu, Apr 19, 2012 at 8:05 AM, Alekseiy Beloshitskiy
>> <abeloshitskiy at velti.com>  wrote:
>>> Dear All,
>>>
>>> I need to estimate the level of similarity of two strings. For example:
>>> string1<- c("depending","audience","research", "school");
>>> string2<- c("audience","push","drama","button","depending");
>>>
>>> The words in string may occur in different order though. What function would you recommend to use to estimate similarity (e.g., levenstein, distance)?
>>>
>>> Appreciate for any advices.
>>>
>>> -Alex
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> --
>>
>> Bert Gunter
>> Genentech Nonclinical Biostatistics
>>
>> Internal Contact Info:
>> Phone: 467-7374
>> Website:
>> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Prof. Dr. Matthias Kohl
www.stamats.de



More information about the R-help mailing list