[Rd] Re: [R] Canberra dist and double zeros

Prof Brian D Ripley ripley@stats.ox.ac.uk
Tue, 6 Mar 2001 08:35:10 +0000 (GMT)


[Moved to R-devel, as more appropriate.]

On Mon, 5 Mar 2001, Jari Oksanen wrote:

> Canberra distance is defined in function `dist' (standard library `mva') as
>
> sum(|x_i - y_i| / |x_i + y_i|)
>
> Obviously this is undefined for cases where both x_i and y_i are zeros.  Since
> double zeros are common in many data sets, this is a nuisance.  In our field
> (from which the distance is coming), it is customary to remove double zeros:
> contribution to distance is zero when both x_i and y_i are zero.  Could it be
> possible to have this kind of feature in R as well?
>
> It seems that this would do the trick without breaking applications where
> double zeros do not occur:

I am sure we should do something, but is this exactly right?  From dist()
in the R-devel version (1.3.x, eventually) I have enabled the handling of
missing values.  With this solution, identically zero elements contribute
zero to the distance, and are not regarded as missing.  Canberra is
similar to binary, where x_i = y_i = 0 is treated as equivalent to
missing.  The issue is if count should be incremented if
sum == 0.0 or not.

A related issue is the test (sum > 0.0).  I guess there are potential
problems with optimization on machines that use extended-precision
arithmetic, where sum might be non-zero in a register but zero if stored.
Not sure if that can actually happen, but a tolerance (e.g. machar's
xmax) is usually safer.


>
> --- R-1.2.2/src/appl/distance.c	Sun Oct 15 18:13:25 2000
> +++ R-work/src/appl/distance.c	Mon Mar  5 10:16:53 2001
> @@ -93,5 +93,5 @@
>  double R_canberra(double *x, int nr, int nc, int i1, int i2)
>  {
> -    double dist;
> +    double dist, sum;
>      int count, j;
>
> @@ -100,5 +100,7 @@
>      for(j=0 ; j<nc ; j++) {
>  	if(R_FINITE(x[i1]) && R_FINITE(x[i2])) {
> -	    dist += fabs(x[i1] - x[i2])/fabs(x[i1] + x[i2]);
> +	    sum = fabs(x[i1] + x[i2]);
> +	    if (sum > 0.0)
> +		dist += fabs(x[i1] - x[i2])/sum;
>  	    count++;
>  	}
>
>
>
> Best wishes, Jari Oksanen
> --
> Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland
> Ph. +358 8 5531526 (job), mobile +358 40 5136529
> email jari.oksanen@oulu.fi, homepage http://cc.oulu.fi/~jarioksa/
>
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request@stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
>

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._