[R] Canberra distance

Frédéric Chiroleu frederic.chiroleu at cirad.fr
Tue Oct 16 09:47:51 CEST 2007


Hi,

I misunderstand the definition of Canberra distance in R.

On Internet and in function description pages of dist() from stats and 
Dist() from amap, Canberra distance between vectors x and y, d(x,y), is :

d(x,y) = sum(abs(x-y)/(x+y))

But in use, through simple examples, we find that the formula is :

d(x,y) = (NZ + 1)/NZ * sum(abs(x-y)/(x+y))

with NZ = nb of pairs of coordinates that are different from (0,0) (Non 
Zeros)

Functions vegdist() from vegan and gdist() from mvpart, like 
documentation of ADE4 software, use (for positive variables) :

d(x,y) = 1/NZ * sum(abs(x-y)/(x+y))

Can someone help me to understand the differences in the choice of the 
formula and why there's a difference between calculus and explaination 
for dist() ?

Thank you for your help.

Best regards,

Fred

PS : Be careful with function dudi.pca() from ade4 ; in values, "norm" 
doesn't give you what is written in the help page : "norm" returns the 
vector of standard deviations of initial variables when you choose 
"normed" PCA and the vector of standard deviations of normed variables, 
ie 1,  when you choose non "normed" PCA. We contacted authors of the 
package unsuccessly to rectify the information.

-- 
Dr. Frédéric Chiroleu
Biométricien
CIRAD-Systèmes Biologiques (Cirad-Bios)
UMR 53 PVBMT (Peuplements Végétaux et Bio-agresseurs en Milieu Tropical)
Laboratoire d'Ecologie Terrestre et de Lutte Intégrée (LETLI)
Pôle de Protection des Plantes (3P)
7, chemin de l'IRAT
Ligne Paradis
97410 Saint-Pierre
Île de la Réunion - France
Tél. : +262 (0)262 499 230
Standard : +262 (0)262 499 200
Fax : +262 (0)262 499 293
Courriel : frederic.chiroleu at cirad.fr



More information about the R-help mailing list