dist() {"mva" package} bug: treats +/- Inf as NA

Martin Maechler Martin Maechler <maechler@stat.math.ethz.ch>
Mon, 21 Oct 2002 17:42:19 +0200

Vince Carey found this (thank you!).
Since the fix to the problem is not entirely obvious, I post
this to R-devel as RFC:

help(dist)  says:

 >>  Missing values are allowed, and are excluded from all computations
 >>  involving the rows within which they occur.  If some columns are
 >>  excluded in calculating a Euclidean, Manhattan or Canberra
 >>  distance, the sum is scaled up proportionally to the number of
 >>  columns used. If all pairs are excluded when calculating a
 >>  particular distance, the value is `NA'.

but the C code in  ....../src/library/mva/src/distance.c,
has, e.g. for the Euclidean distance :

    count= 0;
    dist = 0;
    for(j = 0 ; j < nc ; j++) {
	if(R_FINITE(x[i1]) && R_FINITE(x[i2])) {
	    dev = (x[i1] - x[i2]);
	    dist += dev * dev;
	i1 += nr;
	i2 += nr;
    if(count == 0) return NA_REAL;
    if(count != nc) dist /= ((double)count/nc);
    return sqrt(dist);

where it is clear that "R_FINITE(*)" should in principle be
replaced by  "!ISNAN(*)".

Note however that "Inf - Inf -> NaN" and e.g the canberra metric
has more ways to get "NaN".

The current code drops all pairs with an +-Inf in it.
I would be inclined to really replace
	if(R_FINITE(x[i1]) && R_FINITE(x[i2])) {
	if(!ISNAN(x[i1])   && !ISNAN(x[i2])) {

for all metrics -- for R-patched.

But I'd also see reasons where we'd want to be smarter/different
than that. One possibility would be to drop the pair (as
currently) also when both are not finite, 
for "binary" to signal an error for +- Inf,
and for "Canberra" :
    Of course,  d =  |x - y| / |x + y| 
    will be 1 , when one of {x,y} is infinite.
This could be considered the desired answser, or also
we may give a warning in any case.


Martin Maechler <maechler@stat.math.ethz.ch>	http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO C16	Leonhardstr. 27
ETH (Federal Inst. Technology)	8092 Zurich	SWITZERLAND
phone: x-41-1-632-3408		fax: ...-1228			<><
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch