[Rd] nchar(x, type = "bytes") seems slower than it could be

Tomas Kalibera tom@@@k@||ber@ @end|ng |rom gm@||@com
Thu Apr 15 15:31:31 CEST 2021


For reference, fixed in R-devel (80153).
Tomas

On 3/30/21 10:20 AM, Tomas Kalibera wrote:
> Thanks for the report, you are probably running into the overhead of 
> the eager creation of the error message. On my system, with your 
> micro-benchmark, it is about 10x. I've tested simply by uncommenting 
> it and re-running the benchmark. I'll fix (this is not a good task for 
> a contributed patch).
>
> Best,
> Tomas
>
> On 3/30/21 8:02 AM, Hugh Parsonage wrote:
>> While profiling some C code, I rolled my own nchar function which
>> appears to be much faster than base R's (25 times faster for a 10M
>> length vector).  Obviously base::nchar provides significantly more
>> features than my barebones function (C snippet below); however, for
>> argument type = "bytes" it seems that the R_nchar and do_nchar
>> functions do not actually do anything more than this function.
>> My suspicion is that I have overlooked some subtlety in the base R
>> code, or that my benchmarks are not representative. Alternatively,
>> the action in `do_nchar` of preparing the potential error message
>> before being passed to `R_nchar` may be quite costly indeed.  Or the
>> function cannot be unswitched from the more complex width and chars
>> arguments by the compiler.
>>
>> If I haven't missed something, would a patch be warranted?
>>
>> SEXP Cnchar(SEXP x) {
>>    R_xlen_t N = xlength(x);
>>    SEXP ans = PROTECT(allocVector(INTSXP, N));
>>    int * restrict ansp = INTEGER(ans);
>>
>>    // Ignoring NA to avoid the branch has a very small
>>    // impact on performance.
>>    for (R_xlen_t i = 0; i < N; ++i) {
>>      SEXP sxi = STRING_ELT(x, i);
>>      if (sxi == NA_STRING) {
>>        ansp[i] = NA_INTEGER;
>>        continue;
>>      }
>>      ansp[i] = length(sxi);
>>    }
>>    UNPROTECT(1);
>>    return ans;
>> }
>>
>> x <- rep_len(c(as.character(c(5L, 1:1e6)), NA_character_, 1e6:15e5), 
>> 1e7)
>> Cnchar(x)
>> 90ms
>> nchar(x, type = "bytes")
>> 2500 ms
>>
>> ______________________________________________
>> R-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>



More information about the R-devel mailing list