[Rd] Canberra distance

Duncan Murdoch murdoch at stats.uwo.ca
Sat Feb 6 18:00:24 CET 2010


On 06/02/2010 11:31 AM, Christophe Genolini wrote:
> The definition I use is the on find in the book "Cluster analysis" by 
> Brian Everitt, Sabine Landau and Morven Leese.
> They cite, as definition paper for Canberra distance, an article of 
> Lance and Williams "Computer programs for hierarchical polythetic 
> classification" Computer Journal 1966.
> I do not have access, but here is the link : 
> http://comjnl.oxfordjournals.org/cgi/content/abstract/9/1/60
> Hope this helps.
> 

I do have access to that journal, and that paper gives the definition

sum(|x_i - y_i|) / sum(x_i + y_i)

and suggests the variation

sum( [|x_i - y_i|) / (x_i + y_i) ] )

It doesn't call either one the Canberra distance; it calls the first one 
the "non-metric coefficient" and doesn't name the second.  (I imagine 
the Canberra name came from the fact that the authors were at CSIRO in 
Canberra.)

So I'd agree your definition is better, but I don't know if it can 
really be called the "Canberra distance".

Duncan Murdoch

> Christophe
>> On 06/02/2010 10:39 AM, Christophe Genolini wrote:
>>> Hi the list,
>>>
>>> According to what I know, the Canberra distance between X et Y is : 
>>> sum[ (|x_i - y_i|) / (|x_i|+|y_i|) ] (with | | denoting the function 
>>> 'absolute value')
>>> In the source code of the canberra distance in the file distance.c, 
>>> we find :
>>>
>>>     sum = fabs(x[i1] + x[i2]);
>>>     diff = fabs(x[i1] - x[i2]);
>>>     dev = diff/sum;
>>>
>>> which correspond to the formula : sum[ (|x_i - y_i|) / (|x_i+y_i|) ]
>>> (note that this does not define a distance... This is correct when 
>>> x_i and y_i are positive, but not when a value is negative.)
>>>
>>> Is it on purpose or is it a bug?
>> It matches the documentation in ?dist, so it's not just a coding 
>> error.  It will give the same value as your definition if the two 
>> items have the same sign (not only both positive), but different 
>> values if the signs differ.
>>
>> The first three links I found searching Google Scholar for "Canberra 
>> distance" all define it only for non-negative data.  One of them gave 
>> exactly the R formula (even though the absolute value in the 
>> denominator is redundant), the others just put x_i + y_i in the 
>> denominator.
>>
>> None of the 3 papers cited the origin of the definition, so I can't 
>> tell you who is wrong.
>>
>> Duncan Murdoch
>>
>>



More information about the R-devel mailing list