[R] Canberra dist and double zeros

Jari Oksanen jarioksa at cc.oulu.fi
Mon Mar 5 09:32:19 CET 2001


Canberra distance is defined in function `dist' (standard library `mva') as

sum(|x_i - y_i| / |x_i + y_i|)

Obviously this is undefined for cases where both x_i and y_i are zeros.  Since 
double zeros are common in many data sets, this is a nuisance.  In our field 
(from which the distance is coming), it is customary to remove double zeros: 
contribution to distance is zero when both x_i and y_i are zero.  Could it be 
possible to have this kind of feature in R as well?

It seems that this would do the trick without breaking applications where 
double zeros do not occur:

--- R-1.2.2/src/appl/distance.c	Sun Oct 15 18:13:25 2000
+++ R-work/src/appl/distance.c	Mon Mar  5 10:16:53 2001
@@ -93,5 +93,5 @@
 double R_canberra(double *x, int nr, int nc, int i1, int i2)
 {
-    double dist;
+    double dist, sum;
     int count, j;
 
@@ -100,5 +100,7 @@
     for(j=0 ; j<nc ; j++) {
 	if(R_FINITE(x[i1]) && R_FINITE(x[i2])) {
-	    dist += fabs(x[i1] - x[i2])/fabs(x[i1] + x[i2]);
+	    sum = fabs(x[i1] + x[i2]);
+	    if (sum > 0.0)
+		dist += fabs(x[i1] - x[i2])/sum;
 	    count++;
 	}



Best wishes, Jari Oksanen
-- 
Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland
Ph. +358 8 5531526 (job), mobile +358 40 5136529
email jari.oksanen at oulu.fi, homepage http://cc.oulu.fi/~jarioksa/

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list