[R] Moving window regressions - how can I improve this code?

Gabor Grothendieck ggrothendieck at myway.com
Sun May 16 14:28:22 CEST 2004


Others have already addressed the bug.

Your other question seems to be how to get some simplification.
Note that Greg Warnes responded in:
http://article.gmane.org/gmane.comp.lang.r.general/18033
regarding the use of running in gregmisc. 

Another thing you could do
if you want to retain the speed of the lm.fit solution yet
not deal with the argument manipulations in that solution is
that you could pass the lm structure to your routine instead 
of reconstructing the lm structure in the routine, itself.  
That provides some simplification at the expense of a slightly 
more complex calling sequence:

# e.g. movingWindow3( lm(weight ~ height, women, x=TRUE, y=TRUE), 9 )
movingWindow3 <- function(lm., width) {
	x = lm.$x
	y = lm.$y
	nr = NROW(y)
	ncoef = length(coef(lm.))
	stopifnot(width >= ncoef, width <= nr)
	terms = lm.$terms
	inds = embed(1:nr, width)[, width:1]
	sumrys <- apply(inds, 1, function(ix) {
			fit = lm.fit(x[ix,,drop=F], y[ix])
			fit$terms = terms
			class(fit) = "lm"
			summary(fit)
	})
	list(coefficients = sapply(sumrys, function(sm) coef(sm)[,"Estimate"]),
	  Std.Error = sapply(sumrys, function(sm) coef(sm)[,"Std. Error"]),
	  sigma = sapply(sumrys, "[[", "sigma"),
	  r.squared = sapply(sumrys, "[[", "r.squared"))
}
z3 <- movingWindow3( lm(weight ~ height, women, x=TRUE, y=TRUE), 9 


--- 
Ajay Shah <ajayshah at mayin.org> writes:

> I had written a posting, some weeks ago, where I had written
> fortrannish code for "moving window regressions" (also called 'rolling
> window regression'), and asked how to do the code better. Both of you
> had put much pain on this, and emerged with this great code:
> 
> data(women)
> movingWindow2 <- function(formula, data, width, ...) {
> mCall = match.call()
> mCall$width = NULL
> mCall[[1]] = as.name("lm")
> mCall$x = mCall$y = TRUE
> bigfit = eval(mCall, parent.frame())
> ncoef = length(coef(bigfit))
> nr = nrow(data)
> width = as.integer(width)[1]
> stopifnot(width >= ncoef, width <= nr)
> y = bigfit$y
> x = bigfit$x
> terms = bigfit$terms
> inds = embed(seq(nr), width)[, rev(seq(width))]
> sumrys <- lapply(seq(nrow(inds)),
> function(st) {
> ind = inds[st,]
> fit = lm.fit(x[ind,], y[ind])
> fit$terms = terms
> class(fit) = "lm"
> summary(fit)
> })
> list(coefficients = sapply(sumrys, function(sm) coef(sm)[,"Estimate"]),
> Std.Error = sapply(sumrys, function(sm) coef(sm)[,"Std. Error"]),
> sigma = sapply(sumrys, "[[", "sigma"),
> r.squared = sapply(sumrys, "[[", "r.squared"))
> }
> 
> I have one piece of information, and one bugreport:
> 
> * I timed this, and compared it against my fortrannish code, and
> this is roughly 2.5x faster :-)
> 
> > junk = data.frame(x=runif(1000), y=runif(1000))
> > system.time(movingWindowRegression(women, 1000, 9, weight ~ height, 2))
> [1] 20.07 0.01 20.80 0.00 0.00
> > system.time(movingWindow2(y ~ x, junk, 10))
> [1] 8.27 0.03 8.43 0.00 0.00

You seem to be comparing timings on different data sets and models.
Did you mean to use junk and y ~ x in your first timing call?

> (My notebook is a Celeron @ 500 Mhz).
> 
> * I find it breaks when I ask him to do a regression on 1:
> 
> > r = movingWindowRegression(junk, 1000, 10, y ~ 1, 1)
> > movingWindow2(y ~ 1, junk, 10) 
> Error in lm.fit(x[ind, ], y[ind]) : `x' must be a matrix




More information about the R-help mailing list