[Rd] Canberra distance

Jari Oksanen jari.oksanen at oulu.fi
Sat Feb 6 19:13:46 CET 2010




On 06/02/2010 18:10, "Duncan Murdoch" <murdoch at stats.uwo.ca> wrote:

> On 06/02/2010 10:39 AM, Christophe Genolini wrote:
>> Hi the list,
>> 
>> According to what I know, the Canberra distance between X et Y is : sum[
>> (|x_i - y_i|) / (|x_i|+|y_i|) ] (with | | denoting the function
>> 'absolute value')
>> In the source code of the canberra distance in the file distance.c, we
>> find :
>> 
>>     sum = fabs(x[i1] + x[i2]);
>>     diff = fabs(x[i1] - x[i2]);
>>     dev = diff/sum;
>> 
>> which correspond to the formula : sum[ (|x_i - y_i|) / (|x_i+y_i|) ]
>> (note that this does not define a distance... This is correct when x_i
>> and y_i are positive, but not when a value is negative.)
>> 
>> Is it on purpose or is it a bug?
> 
> It matches the documentation in ?dist, so it's not just a coding error.
>   It will give the same value as your definition if the two items have
> the same sign (not only both positive), but different values if the
> signs differ.
> 
> The first three links I found searching Google Scholar for "Canberra
> distance" all define it only for non-negative data.  One of them gave
> exactly the R formula (even though the absolute value in the denominator
> is redundant), the others just put x_i + y_i in the denominator.

G'day cobbers, 

Without checking the original sources (that I can't do before Monday), I'd
say that the "Canberra distance" was originally suggested only for
non-negative data (abundances of organisms which are non-negative if
observed directly). The fabs(x-y) notation was used just as a convenient
tool to get rid off the original pmin(x,y) for non-negative data -- which is
nice in R, but not so natural in C. Extension of the "Canberra distance" to
negative data probably makes a new distance perhaps deserving a new name
(Eureka distance?).

If you ever go to Canberra and drive around you'll see that it's all going
through a roundabout after a roundabout, and going straight somewhere means
goin' 'round 'n' 'round. That may make you skeptical about the "Canberra
distance". 

Cheers, Jazza Oksanen



More information about the R-devel mailing list