[R] Operating on windows of data

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Mar 22 12:22:21 CET 2004


On Mon, 22 Mar 2004, Ajay Shah wrote:

> On Mon, Mar 22, 2004 at 01:39:28AM -0500, Gabor Grothendieck wrote:
> > You can retain the trick of using subset and still get
> > rid of the loop in:
> > 
> >    http://www.mayin.org/ajayshah/KB/R/EXAMPLES/rollingreg.R
> > 
> > by using sapply like this (untested):
> > 
> > dat <- sapply( seq(T-width), function(i) {
> >     model <- lm(dlinrchf ~ dlusdchf + dljpychf + dldemchf, A, 
> >                 i:(i+width-1))
> >     details <- summary.lm(model)
> >     tmp <- coefficients(model)
> >     c( USD = tmp[2], JPY = tmp[3], DEM = tmp[4], 
> >            R2 = details$r.squared, RMSE = details$sigma )
> > } )
> > dat <- as.data.frame(t(dat))
> > attach(dat)
> 
> This brings me to a question I've always had about "the R way" of
> avoiding loops. Yes, the sapply() approach above works. My question
> is: Why is this much better than writing it using loops?

Why didn't you test it yourself and find out?  That's part of `doing your 
homework' mentioned in the posting guide.

> Loops tap into the intuition of millions of people who have grown up
> around procedural languages. Atleast to a person like me, I can read
> code involving loops effortlessly.
> 
> And I don't see how much faster the sapply() will be. Intuitively, we
> may think that the sapply() results in C code getting executed (in the
> R sources), while the for loop results in interpretation overhead, and
> so the sapply() is surely faster. But when the body of the for loop
> involves a weighty thing like a QR decomposition (for the OLS), that
> would seem to dominate the cost - as far as I can tell.

See `Writing R Extensions' for ways to time and profile R code, and `S
Programming' for some worked examples (but R has changed a lot since
2000).  It's not `would seem to': if you do your homework you will know.

Loops are often the clearest and sometimes the most efficient way in R.  
You should be worrying about what is inside the loop first: in your case 
calling lmfit not lm, not calling summary.lm just to get sigma and 
r.squared, and so on.  Profile your code, find the bottlenecks and 
eliminate them one by one.  If profiling shows that .Fortran("dqrls" 
dominates, you know that little speed-up is possible without finding 
different compiled code.  (BTW, up/downdating a QR decomposition seems to 
be the most promising approach to windowing a regression.)

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list