[Rd] anyNA() performance on vectors of POSIXct

Harvey Smith h@rvey13131 @end|ng |rom gm@||@com
Wed May 1 09:20:55 CEST 2019


Inside of the anyNA() function, it will use the legacy any(is.na()) code if
x is an OBJECT().  If x is a vector of POSIXct, it will be an OBJECT(), but
it is also TYPEOF(x) == REALSXP.  Therefore, it will skip the faster
ITERATE_BY_REGION, which is typically 5x faster in my testing.

Is the OBJECT() condition really necessary, or could it be moved after the
switch() for the individual TYPEOF(x) ITERATE_BY_REGION calls?

# script to demonstrate performance difference if x is an OBJECT or not by
using unclass()
x.posixct = Sys.time() + 1:1e6
microbenchmark::microbenchmark(
  any(is.na( x.posixct )),
  anyNA( x.posixct ),
  anyNA( unclass(x.posixct) ),
  unit='ms')



static Rboolean anyNA(SEXP call, SEXP op, SEXP args, SEXP env)
{
  SEXP x = CAR(args);
  SEXPTYPE xT = TYPEOF(x);
  Rboolean isList =  (xT == VECSXP || xT == LISTSXP), recursive = FALSE;

  if (isList && length(args) > 1) recursive = asLogical(CADR(args));
  *if (OBJECT(x) || (isList && !recursive)) {*
    SEXP e0 = PROTECT(lang2(install("is.na"), x));
    SEXP e = PROTECT(lang2(install("any"), e0));
    SEXP res = PROTECT(eval(e, env));
    int ans = asLogical(res);
    UNPROTECT(3);
    return ans == 1; // so NA answer is false.
  }

  R_xlen_t i, n = xlength(x);
  switch (xT) {
    case REALSXP:
    {
      if(REAL_NO_NA(x))
        return FALSE;
      ITERATE_BY_REGION(x, xD, i, nbatch, double, REAL, {
        for (int k = 0; k < nbatch; k++)
          if (ISNAN(xD[k]))
            return TRUE;
      });
      break;
    }

	[[alternative HTML version deleted]]



More information about the R-devel mailing list