[R] Has R recently made performance improvements in accumulation?

Brent yhbrent at yahoo.com
Mon Aug 1 04:20:47 CEST 2016

Thierry: thanks much for your feedback, and apologies for this tardy response.

You pointed me in the right direction.  I did not appreciate how even if the algorithm ultimately has O(n^2) behavior, it can take a big n to overcome large coefficents on lower order terms (e.g. the O(1) and O(n) parts).

A quick fix to my original code is to simply have 100 columns in each row instead of 10, and to look at bigger numbers of rows as well:

    n = 20
    numRows = seq(from = 1*1000, to = 20*1000, length = n)
    nCol = 50
    execTimes = vector(mode = "numeric", length = n)
    for (i in 1:n) {
        nRow = numRows[i]
        t1 = Sys.time()
        mkFrameForLoop(nRow, nCol)
        t2 = Sys.time()
        execTimes[i] = difftime(t2, t1, units = "secs")    # CRITICAL: must use difftime (instead of t2 - t1) to ensure that units are always seconds

A simple plot shows obvious nonlinearity now:
    plot(numRows, execTimes)

For you guys reading this text email, a human readable table can be gotten from this code
    df = data.frame(numRows = numRows, execTimes = execTimes)
which yields
       numRows  execTimes
    1     1000   3.564204
    2     2000   8.268473
    3     3000  14.923853
    4     4000  23.506344
    5     5000  31.379795
    6     6000  43.820506
    7     7000  56.720244
    8     8000  72.979174
    9     9000  97.328567
    10   10000 113.404486
    11   11000 141.113071
    12   12000 145.597327
    13   13000 168.967664
    14   14000 196.135218
    15   15000 219.662564
    16   16000 237.763599
    17   17000 275.018730
    18   18000 305.647482
    19   19000 327.215715
    20   20000 359.673572

Finally, a quick simple power law fit using
    lm( log(execTimes) ~ log(numRows), data = df )
     (Intercept)  log(numRows)  
         -10.065         1.605  
(i.e. the power over this range of data is 1.605 which is obviously > 1).

boB Rudis: thanks much for the functional elegance suggestion.

More information about the R-help mailing list