[R] Matching failure in merge()

Agustin Lobo Agustin.Lobo at ija.csic.es
Thu Mar 19 20:04:48 CET 2009


Hi!

I've done a merging between 2 dataframes using merge():

delme <-
merge(miDUNS50peqB,Bnomscodmunicipis,by.x="POBLACION",by.y="NOMMUNI",all.x=T,sort=F)

After realizing some problems in the resulting dataset,
I've found that the problem was that, in some cases, there
was no match between the by.x and the by.y elements, despite the
fact that, apparently, such matching should exist. Specifically,
I get no match for the cases in which both the by.x and the by.y variables
are equal to "SANT VICENÇ DELS HORTS". In the following example I select
fields POBLACION and NOMMUNI in two cases for which both fields
should be identical to "SANT VICENÇ DELS HORTS" and I get:
(082634 is the municipality code for that town in Bnomscodmunicipis
and 08620 is the postal code for that town in delme)

 > x <- Bnomscodmunicipis[Bnomscodmunicipis$CODMUN=="082634",1][1]
 > y <- miDUNS50peqB[miDUNS50peqB$CODPOSTAL=="08620","POBLACION"][1]

 > str(x)
  chr "SANT VICENÇ DELS HORTS"
 > str(y)
  chr "SANT VICENÇ DELS HORTS"
 > x==y
[1] FALSE

which I cannot understand. If I just cut and paste
those values and run the equivalent logical operation:
> "SANT VICENÇ DELS HORTS" == "SANT VICENÇ DELS HORTS"
[1] TRUE

The problem is that the values for "SANT VICENÇ DELS HORTS"
in the resulting merged dataframe are wrong.

Any help with this issue would be greatly appreciated, I'm really 
astonished. I think it might involve an encoding problem
with the non-ascii characters, but don't get to see it.

I'm using R 2.8.1 on ubuntu 8.04 (in english; And R is in English too)

Agus

-- 
Dr. Agustin Lobo
Institut de Ciencies de la Terra "Jaume Almera" (CSIC)
LLuis Sole Sabaris s/n
08028 Barcelona
Spain
Tel. 34 934095410
Fax. 34 934110012
email: Agustin.Lobo at ija.csic.es
http://www.ija.csic.es/gt/obster




More information about the R-help mailing list