[R] Operating on windows of data

Martin Maechler maechler at stat.math.ethz.ch
Mon Mar 22 12:19:41 CET 2004


>>>>> "Ajay" == Ajay Shah <ajayshah at mayin.org>
>>>>>     on Mon, 22 Mar 2004 16:18:41 +0530 writes:

    Ajay> On Mon, Mar 22, 2004 at 01:39:28AM -0500, Gabor
    Ajay> Grothendieck wrote:

   >>   You can retain the trick of using subset and still get
   >>   rid of the loop in:
   >>   
   >>      http://www.mayin.org/ajayshah/KB/R/EXAMPLES/rollingreg.R
   >>   
   >>   by using sapply like this (untested):
   >>   
   >>   dat <- sapply( seq(T-width), function(i) {
   >>       model <- lm(dlinrchf ~ dlusdchf + dljpychf + dldemchf, A, 
   >>                   i:(i+width-1))
   >>       details <- summary.lm(model)
   >>       tmp <- coefficients(model)
   >>       c( USD = tmp[2], JPY = tmp[3], DEM = tmp[4], 
   >>              R2 = details$r.squared, RMSE = details$sigma )
   >>   } )
   >>   dat <- as.data.frame(t(dat))
   >>   attach(dat)

    Ajay> This brings me to a question I've always had about
    Ajay> "the R way" of avoiding loops. Yes, the sapply()
    Ajay> approach above works. My question is: Why is this much
    Ajay> better than writing it using loops?

it's not, not at all.
And you are very much right in all you say below!

The important place for avoiding for() loops is in situation
where you can use truly vectorized operations instead,
e.g., replacing

  n <- length(x) ; r <- numeric(n) 
  for(i in 1:n) r[i] <- sin(x[i])

by  r <- sin(x).

Replacing for() loops with sapply() / lapply() can save some
computing time -- as you remark below -- particular when the
function which they apply is simple -- and saving that time in
"inner" computations can become important.
OTOH, replacing an `outermost' for() loop with a `heavy' body,
as in the example above, is not at all the "R way"
{It may have been the S(-plus) way many years ago, when S was
 particularly unfortunately dealing with for() loops.}

Regards, Martin

    Ajay> Loops tap into the intuition of millions of people who
    Ajay> have grown up around procedural languages. Atleast to
    Ajay> a person like me, I can read code involving loops
    Ajay> effortlessly.

    Ajay> And I don't see how much faster the sapply() will
    Ajay> be. Intuitively, we may think that the sapply()
    Ajay> results in C code getting executed (in the R sources),
    Ajay> while the for loop results in interpretation overhead,
    Ajay> and so the sapply() is surely faster. But when the
    Ajay> body of the for loop involves a weighty thing like a
    Ajay> QR decomposition (for the OLS), that would seem to
    Ajay> dominate the cost - as far as I can tell.




More information about the R-help mailing list