[R] text vector clustering

Ed Merkle edgar.merkle at wichita.edu
Fri Jan 23 21:08:42 CET 2009


Srinivas,

I don't know of a clustering algorithm, but you might check out agrep() 
from the base package and stringMatch() from the MiscPsycho package. 
These can help to identify similar text sequences, and it may be 
possible to group similar names by using these commands over and over again.

Ed

-- 
Ed Merkle, PhD
Assistant Professor
Dept. of Psychology
Wichita State University
Wichita, KS 67260


> Date: Thu, 22 Jan 2009 16:33:03 +0530
> From: srinivasa raghavan <srinivasraghav at gmail.com>
> Subject: [R] text vector clustering
> To: r-help at r-project.org
> Message-ID:
>         <e45b69190901220303u114028b1k43ef6f3ab7c7c104 at mail.gmail.com>
> Content-Type: text/plain
> 
> Hi,
> 
> I am a new user of R using R 2.8.1 in windows 2003.  I have a  csv file with
> single column which contain the 30,000 students names. There were typo
> errors while entering this student names. The actual list of names is <
> 1000. However we dont have that list for keyword search.
> 
>  I am interested in grouping/cluster these names   as those which are
> similar  letter to letter.  Are there any text clustering algorithm in R
> which can group names of similar type in to segments of exactly matching ,
> 90% matching, 80% matching,....etc.
> 
> thanks in advance,
> 
> regards,
> srinivas
> statistical analyst.




More information about the R-help mailing list