[R] How to make R running faster

Robert A LaBudde ral at lcfltd.com
Wed May 28 17:56:31 CEST 2008


At 10:25 AM 5/28/2008, Esmail Bonakdarian wrote:
>Erin Hodgess wrote:
>>I remember reading the colSum and colMean were better, when you need
>>sums and means
>
>Well .. I'm waiting for the experts to jump in and give us the
>straight story on this :-)

All of the algorithms are represented internally by sequential 
program logic using C or Fortran, for example. So the issue isn't the 
algorithm itself. Instead, it's where the algorithm is implemented.

However, R is an interpreter, not a compiler. This means that it 
reads each line of R code one character at a time to develop an 
understanding of what is desired done, and to check for errors in 
syntax and data classes. Interpreters are very slow compared to 
compiled code, where the lines have been pre-interpreted and already 
converted to machine code with error checking resolved.

For example a simple "for" loop iteration might take only 0.1 
microsecond in a compiled program, but 20-100 microseconds in an 
interpreted program.

This overhead of parsing each line can be bounded by function calls 
inside each line. If the compiled function executes on a large number 
of cases in one call, then the 50 microsecond overhead per call is diluted out.

R is a parallel processing language. If you use vectors and arrays 
and the built-in (i.e., compiled) function calls, you get maximum use 
of the compiled programs and minimum use of the interpreted program.

This is why functions such as colMeans() or apply() are faster than 
writing direct loops in R. You can speed things up by 200-1000x for 
large arrays.

Interpreted languages are very convenient to use, as they do instant 
error checking and are very interactive. No overhead of learning and 
using compilers and linkers. But they are very slow on complex 
calculations. This is why the array processing is stuffed into 
compiled functions. The best of both worlds then.

Interpreted languages are Java, R, MatLab, Gauss and others. Compiled 
languages are C and Fortran. Some, like variants of BASIC, can be 
interpreted, line-compiled or compiled, depending upon 
implementation. Some compiled languages (such as Fortran), can allow 
parallel processing via multiprocessing on multiple CPUs, which 
speeds things up even more. Compiled languages also typically 
optimize code for the target machine, which can speed things up a 
factor of 2 or so.

So the general rule for R is: If you are annoyed at processing time, 
alter your program to maximize calculations within compiled functions 
(i.e., "vectorize" your program to process an entire array at one 
time) and minimize the number of lines of R.

================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: ral at lcfltd.com
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"



More information about the R-help mailing list