[Rd] The function cummax() seems to have a bug.

Henrik Bengtsson henrik.bengtsson at ucsf.edu
Mon May 18 02:22:45 CEST 2015


Below is some further troubleshooting on this:

>From code inspection this bug happens for only:

* for integer values
* when the first element is NA_integer_ and the second is not.


Examples:

# Numeric/doubles works as expected
> cummax(c(NA_real_, 0, 1, 2, 3))
[1] NA NA NA NA NA

# It does not occur when the first value is non-NA
> cummax(c(0L, NA_integer_, 1L, 2L, 3L))
[1]  0 NA NA NA NA

# When first value is NA, it is not "remembered"
# (because internal for loop starts with 2nd element)
> cummax(c(NA_integer_, 0L, 1L, 2L, 3L))
[1] NA  0  1  2  3

The problem is not there for cummin():

> cummin(c(0L, NA_integer_, 1L, 2L, 3L))
[1]  0 NA NA NA NA
> cummin(c(NA_integer_, 0L, 1L, 2L, 3L))
[1] NA NA NA NA NA

but that is just "pure luck" due to the fact how NA_integer_ is
internally represented as the smallest possible 4-byte signed integer,
i.e.

LibExtern int    R_NaInt;   /* NA_INTEGER:= INT_MIN currently */
#define NA_INTEGER  R_NaInt

Note the comment, which implies that code should not rely on the
assumption that NA_integer_ == NA_INTEGER == R_NaInt == INT_MIN; it
could equally well have been INT_MAX, which in case cummin()would
return the wrong result whereas cummax() wouldn't. So, cummin() makes
the same mistake ascummax(), where the for-loop skips the test for NA
of the first element, cf.
https://github.com/wch/r-source/blob/trunk/src/main/cum.c#L145-L148

The simple solution is probably to do (cf. native icumsum):

[HB-X201]{hb}: svn diff src/main/cum.c
Index: src/main/cum.c
===================================================================
--- src/main/cum.c      (revision 68378)
+++ src/main/cum.c      (working copy)
@@ -130,7 +130,7 @@
     int *ix = INTEGER(x), *is = INTEGER(s);
     int max = ix[0];
     is[0] = max;
-    for (R_xlen_t i = 1 ; i < xlength(x) ; i++) {
+    for (R_xlen_t i = 0 ; i < xlength(x) ; i++) {
        if(ix[i] == NA_INTEGER) break;
        is[i] = max = (max > ix[i]) ? max : ix[i];
     }
@@ -142,7 +142,7 @@
     int *ix = INTEGER(x), *is = INTEGER(s);
     int min = ix[0];
     is[0] = min;
-    for (R_xlen_t i = 1 ; i < xlength(x) ; i++ ) {
+    for (R_xlen_t i = 0 ; i < xlength(x) ; i++ ) {
        if(ix[i] == NA_INTEGER) break;
        is[i] = min = (min < ix[i]) ? min : ix[i];
     }

/Henrik

On Sun, May 17, 2015 at 4:13 AM, Dongcan Jiang <dongcan.jiang at gmail.com> wrote:
> Hi,
>
> The function cummax() seems to have a bug.
>
>> x <- c(NA, 0)
>> storage.mode(x) <- "integer"
>> cummax(x)
> [1] NA  0
>
> The correct result of this case should be NA NA. The mistake in
> [https://github.com/wch/r-source/blob/trunk/src/main/cum.c#L130-L136] may be
> the reason.
>
> Best Regards,
> Dongcan
>
> --
> Dongcan Jiang
> Team of Search Engine & Web Mining
> School of Electronic Engineering & Computer Science
> Peking University, Beijing, 100871, P.R.China



More information about the R-devel mailing list