[R] speeding up functions for large datasets

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri Aug 6 10:07:32 CEST 2004


On Fri, 6 Aug 2004 Freja.Vamborg at astrazeneca.com wrote:

> Dear R-helpers, 
> I'm dealing with large datasets, say tables of 60 000 times 12 or so, and
> some of the functions are (too ) slow and I'm therefore trying to find ways
> to speed them up.
> I've found that for instance for-loops are slow in R (both by testing and by
> searching through mail archives etc )

I don't think that is really true, but it is the case that using
row-by-row operations in your situation would be slow *if they are
unnecessary*. It is a question of choosing the right algorithmic approach,
not whether it is implemented by for-loops or lapply or ....

> Are there any more well known arguments that are slow in R, ,maybe at data
> representation level, code-writing, reading in the data.
> I've also tried incorporating C-code, which works well, but I'd also like to
> find other, maybe more "shortcut" ways.

`S Programming' (see the R FAQ) has a whole chapter on this sort of thing, 
with examples.  More generally you want to take a `whole object' view and 
use indexing and other vectorized operations.

Note also that what is slow does change with the version of R and 
especially how much memory you have installed.  The first step is to get 
enough RAM.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list