[R] matching similar character strings

A M Lavezzi mario.lavezzi at unipa.it
Fri Jun 21 11:56:27 CEST 2013


Hello everybody

I have this problem: I need to match an addresses database F1 with the
information contained in a toponymic database F2.

The format of F1 is given by three columns and 800 rows, with the
columns being:

A1. Street/Road/Avenue
A2. Name
A3. Number

Consider for instance Avenue J. Kennedy , 3011. In F1 this is:

A1. Avenue
A2. J. Kennedy
A3. 3011

The format of F2 file is instead given by 20000 rows and five columns:

B1. Street/Road/Avenue
B2. Name
B3. Starting Street Number
B4. Ending Street Number
B5. Census section

So my problem is attributing the  B5 Census section to every
observation of F1 if: A1=B1, A2=B2, and A3 is comprised between B3 and
B4.

The problem is that while the information in A2 is irregularly
recorded, B2 has a given format that is Family name (space) Given
name.

So I could have that while in B2 the information is:

Kennedy John

In A2 it could be:

John Kennedy
JF Kennedy
J. Kennedy

and so on.

Thanks,

Mario

-- 
Andrea Mario Lavezzi
Dipartimento di Scienze Giuridiche, della Società e dello Sport
Sezione Diritto e Società
Università di Palermo
Piazza Bologni 8
90134 Palermo, Italy
tel. ++39 091 23892208
fax ++39 091 6111268
skype: lavezzimario
email: mario.lavezzi (at) unipa.it
web: http://www.unipa.it/~mario.lavezzi



More information about the R-help mailing list