[Rd] Speeding up sum and prod

Radford Neal radford at cs.toronto.edu
Mon Aug 23 19:19:01 CEST 2010


Looking for more ways to speed up R, I've found that large
improvements are possible in the speed of "sum" and "prod" for long
real vectors.  

Here is a little test with R version 2.11.1 on an Intel Linux system

> a <- seq(0,1,length=1000)
> system.time({for (i in 1:1000000) b <- sum(a)})
   user  system elapsed
  4.800   0.010   4.817
> system.time({for (i in 1:1000000) b <- sum(a,na.rm=TRUE)})
   user  system elapsed
  8.240   0.030   8.269

and here is the same with "sum" and "prod" modified as described below:

> a <- seq(0,1,length=1000)
> system.time({for (i in 1:1000000) b <- sum(a)})
   user  system elapsed
   1.81    0.00    1.81
> system.time({for (i in 1:1000000) b <- sum(a,na.rm=TRUE)})
   user  system elapsed
  7.250   0.010   7.259

That's an improvement by a factor of 2.65 for real vectors of length
1000 with na.rm=FALSE (the default), and an improvement of 12% when
na.rm=TRUE.  Of course, the improvement is smaller for very short
vectors.

The biggest reason for the improvement is that the current code (in
2.11.1 and in the development release of 2010-08-19) makes a costly
call of ISNAN even when the option is na.rm=FALSE.  The inner loop
can also be sped up a bit in other respects.

Here is the old procedure, in src/main/summary.c:

static Rboolean rsum(double *x, int n, double *value, Rboolean narm)
{
    LDOUBLE s = 0.0;
    int i;
    Rboolean updated = FALSE;

    for (i = 0; i < n; i++) {
        if (!ISNAN(x[i]) || !narm) {
            if(!updated) updated = TRUE;
            s += x[i];
        }
    }
    *value = s;

    return(updated);
}

and here is my modified version:

static Rboolean rsum(double *x, int n, double *value, Rboolean narm)
{
    LDOUBLE s = 0.0;
    int i;
    Rboolean updated = FALSE;

    if (narm) {
        for (i = 0; i < n; i++) {
            if (!ISNAN(x[i])) {
                s += x[i];
                updated = TRUE;
                break;
            }
        }
        for (i = i+1; i < n; i++) {
            if (!ISNAN(x[i]))
                s += x[i];
        }
    } else {
        for (i = 0; i < n; i++)
            s += x[i];
        if (n>0) updated = TRUE;
    }

    *value = s;

    return(updated);
}

An entirely analogous improvement can be made to the "prod" function.

   Radford Neal



More information about the R-devel mailing list