[Rd] Canberra distance

Duncan Murdoch murdoch at stats.uwo.ca
Sat Feb 6 17:10:07 CET 2010


On 06/02/2010 10:39 AM, Christophe Genolini wrote:
> Hi the list,
> 
> According to what I know, the Canberra distance between X et Y is : sum[ 
> (|x_i - y_i|) / (|x_i|+|y_i|) ] (with | | denoting the function 
> 'absolute value')
> In the source code of the canberra distance in the file distance.c, we 
> find :
> 
>     sum = fabs(x[i1] + x[i2]);
>     diff = fabs(x[i1] - x[i2]);
>     dev = diff/sum;
> 
> which correspond to the formula : sum[ (|x_i - y_i|) / (|x_i+y_i|) ]
> (note that this does not define a distance... This is correct when x_i 
> and y_i are positive, but not when a value is negative.)
> 
> Is it on purpose or is it a bug?

It matches the documentation in ?dist, so it's not just a coding error. 
  It will give the same value as your definition if the two items have 
the same sign (not only both positive), but different values if the 
signs differ.

The first three links I found searching Google Scholar for "Canberra 
distance" all define it only for non-negative data.  One of them gave 
exactly the R formula (even though the absolute value in the denominator 
is redundant), the others just put x_i + y_i in the denominator.

None of the 3 papers cited the origin of the definition, so I can't tell 
you who is wrong.

Duncan Murdoch



More information about the R-devel mailing list