[R] C/C++/Fortran Rolling Window Regressions

Gabor Grothendieck ggrothendieck at gmail.com
Thu Jul 21 22:28:33 CEST 2016


I would be careful about making assumptions regarding what is faster.
Performance tends to be nonintuitive.

When I ran rollapply/lm, rollapply/fastLm and roll_lm on the example
you provided rollapply/fastLm was three times faster than roll_lm.  Of
course this could change with data of different dimensions but it
would be worthwhile to do actual benchmarks before making assumptions.

I also noticed that roll_lm did not give the same coefficients as the other two.

set.seed(1)
library(zoo)
library(RcppArmadillo)
library(roll)
z <- zoo(matrix(rnorm(10), ncol = 2))
colnames(z) <- c("y", "x")

## rolling regression of width 4
library(rbenchmark)
benchmark(fastLm = rollapplyr(z, width = 4,
     function(x) coef(fastLm(cbind(1, x[, 2]), x[, 1])),
     by.column = FALSE),
   lm = rollapplyr(z, width = 4,
     function(x) coef(lm(y ~ x, data = as.data.frame(x))),
     by.column = FALSE),
   roll_lm =  roll_lm(coredata(z[, 1, drop = F]), coredata(z[, 2, drop = F]), 4,
     center = FALSE))[1:4]


     test replications elapsed relative
1  fastLm          100    0.22    1.000
2      lm          100    0.72    3.273
3 roll_lm          100    0.64    2.909

On Thu, Jul 21, 2016 at 3:45 PM, jeremiah rounds
<roundsjeremiah at gmail.com> wrote:
>  Thanks all.  roll::roll_lm was essentially what I wanted.   I think maybe
> I would prefer it to have options to return a few more things, but it is
> the coefficients, and the remaining statistics you might want can be
> calculated fast enough from there.
>
>
> On Thu, Jul 21, 2016 at 12:36 PM, Achim Zeileis <Achim.Zeileis at uibk.ac.at>
> wrote:
>
>> Jeremiah,
>>
>> for this purpose there are the "roll" and "RcppRoll" packages. Both use
>> Rcpp and the former also provides rolling lm models. The latter has a
>> generic interface that let's you define your own function.
>>
>> One thing to pay attention to, though, is the numerical reliability.
>> Especially on large time series with relatively short windows there is a
>> good chance of encountering numerically challenging situations. The QR
>> decomposition used by lm is fairly robust while other more straightforward
>> matrix multiplications may not be. This should be kept in mind when writing
>> your own Rcpp code for plugging it into RcppRoll.
>>
>> But I haven't check what the roll package does and how reliable that is...
>>
>> hth,
>> Z
>>
>>
>> On Thu, 21 Jul 2016, jeremiah rounds wrote:
>>
>> Hi,
>>>
>>> A not unusual task is performing a multiple regression in a rolling window
>>> on a time-series.    A standard piece of advice for doing in R is
>>> something
>>> like the code that follows at the end of the email.  I am currently using
>>> an "embed" variant of that code and that piece of advice is out there too.
>>>
>>> But, it occurs to me that for such an easily specified matrix operation
>>> standard R code is really slow.   rollapply constantly returns to R
>>> interpreter at each window step for a new lm.   All lm is at its heart is
>>> (X^t X)^(-1) * Xy,  and if you think about doing that with Rcpp in rolling
>>> window you are just incrementing a counter and peeling off rows (or
>>> columns
>>> of X and y) of a particular window size, and following that up with some
>>> matrix multiplication in a loop.   The psuedo-code for that Rcpp
>>> practically writes itself and you might want a wrapper of something like:
>>> rolling_lm (y=y, x=x, width=4).
>>>
>>> My question is this: has any of the thousands of R packages out there
>>> published anything like that.  Rolling window multiple regressions that
>>> stay in C/C++ until the rolling window completes?  No sense and writing it
>>> if it exist.
>>>
>>>
>>> Thanks,
>>> Jeremiah
>>>
>>> Standard (slow) advice for "rolling window regression" follows:
>>>
>>>
>>> set.seed(1)
>>> z <- zoo(matrix(rnorm(10), ncol = 2))
>>> colnames(z) <- c("y", "x")
>>>
>>> ## rolling regression of width 4
>>> rollapply(z, width = 4,
>>>   function(x) coef(lm(y ~ x, data = as.data.frame(x))),
>>>   by.column = FALSE, align = "right")
>>>
>>> ## result is identical to
>>> coef(lm(y ~ x, data = z[1:4,]))
>>> coef(lm(y ~ x, data = z[2:5,]))
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list