[R] max.col: bug or just oddity?

John Zedlewski jzedlewski at hbs.edu
Thu Dec 14 21:57:41 CET 2006


I've noticed that the max.col function with the default "random"
option often gives unexpected results. For instance, in this test, it
seems clear what the answer should be:

> # second col should always be max
> x1 = cbind(1:10, 2:11, -Inf)
>
> # this works fine
> max.col(x1, "first")
 [1] 2 2 2 2 2 2 2 2 2 2
>
> # this gives random answers
> max.col(x1)
> [1] 3 1 1 2 3 3 1 3 1 1

Ouch! max.col is randomizing across all values.
Even without infinite values, something similar can happen:

> # test 2 --- tolerance problems
>
> # clearly column 3 is the max
> x1 = cbind(-1e9 * 1:10, 1:10, 2:11)
>
> # again, first method works:
> max.col(x1, "first")
 [1] 3 3 3 3 3 3 3 3 3 3
>
> # but random doesn't
> max.col(x1)
 [1] 2 3 2 3 3 2 2 2 3 2
>

The max.col docs say " there is a relative tolerance of 1e-5, relative
to the largest entry in the row", but it's really using the maximum
absolute value entry in the row (appl/maxcol.c, line 35 in R 2.4.0).
Is this necessary for some sort of S-plus compatibility? If so, I
think it would be good to make this absolute value property very clear
in the docs, since it can cause subtle bugs (and did for me).

Personally, I think the behavior is much nicer with the following patch:

--- rplain/R-2.4.0/src/appl/maxcol.c    2006-04-09 18:19:58.000000000 -0400
+++ R-2.4.0/src/appl/maxcol.c   2006-12-14 15:30:56.000000000 -0500
@@ -26,13 +26,14 @@
        do_rand = *ties_meth == 1;

     for (r = 0; r < n_r; r++) {
-       /* first check row for any NAs and find the largest abs(entry) */
+       /* first check row for any NAs and find the largest entry */
        large = 0.0;
        isna = FALSE;
        for (c = 0; c < *nc; c++) {
            a = matrix[r + c * n_r];
            if (ISNAN(a)) { isna = TRUE; break; }
-           if (do_rand) large = fmax2(large, fabs(a));
+           if (!R_FINITE(a)) continue;
+           if (do_rand) large = fmax2(large, a);
        }
        if (isna) { maxes[r] = NA_INTEGER; continue; }

--------------------------------------- END
-------------------------------------------------

This gives the expected behavior in the two examples above.



More information about the R-help mailing list