[Rd] Speed up code, profiling, optimization, lapply vs. loops

Kasper Daniel Hansen khansen at stat.berkeley.edu
Tue Jul 7 06:53:36 CEST 2009


Aside from the advice from other people, you seem to be doing many glm  
calls. A big part of a call to a model function involves setting up  
the design matrix, check for missing values etc. If I understand you  
description correctly you may only need to do this once. This will  
require some poking around in glm, but might save you a lot of time.

Kasper

On Jul 6, 2009, at 1:26 , Thorn Thaler wrote:

> High everybody,
>
> currently I'm writinig a package that, for a given family of  
> variance functions depending on a parameter theta, say, computes the  
> extended quasi likelihood (eql) function for different values of  
> theta.
>
> The computation involves a couple of calls of the 'glm' routine.  
> What I'm doing now is to call 'lapply' for a list of theta values  
> and a function, that constructs a family object for the particular  
> choice of theta, computes the glm and uses the results to get the  
> eql. Not surprisingly the function is not very fast. Depending on  
> the size of the parameter space under consideration it takes a  
> couple of minutes until the function finishes. Testing ~1000  
> Parameters takes about 5 minutes on my machine.
>
> I know that loops in R are slow more often than not. Thus, I thought  
> using 'lapply' is a better way. But anyways, it is just another way  
> of a loop. Besides, it involves some overhead for the function call  
> and hence i'm not sure wheter using 'lapply' is really the better  
> choice.
>
> What I like to know is to figure out, where the bottleneck lies.  
> Vectorization would help, but since I don't think that there is  
> vectorized 'glm' function, which is able to handle a vector of  
> family objects. I'm not aware if there is any choice aside from  
> using a loop.
>
> So my questions:
> - how can I figure out where the bottleneck lies?
> - is 'lapply' always superior to a loop in terms of execution time?
> - are there any 'evil' commands that should be avoided in a loop,  
> for they slow down the computation?
> - are there any good books, tutorials about how to profile R code  
> efficiently?
>
> TIA 4 ur help,
>
> Thorn
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list