[R] Moving window regressions - how can I improve this code?

Sat Apr 24 19:24:39 CEST 2004

Gabor Grothendieck <ggrothendieck at myway.com> writes:

> Douglas Bates produced an extremely useful post so I hesitate to 
> add more but here goes anyways. 
> 
> 1. The bug I referred to in my previous post is still in the code.  
> The last data point in women never gets used.  I think this
> goes beyond just making a simple error but gets to the point of
> how to avoid index arithmetic in the first place due to its
> error prone nature and excess complexity.    Using embed()
> or running() (the latter is in package gregmisc) would do that although
> at the expense of more memory.

You're right.  I miscounted and it is easy to do that when
manipulating the indices explicitly.  I wasn't aware of the embed
function but, now that you have pointed it out, I agree that it should
be used here.

> 2. I find using stopifnot convenient for testing assertions.  There
> is also a bug in the assertion code shown (width should be allowed 
> to equal nr) and again I think its due to the introduction of complexity.  
> The problem is that its harder to get it right when you have to invert 
> the assertion condition.  I have recently discovered the R 
> stopifnot function which allows one to state assertions without 
> inverting them and now use it all the time.   Once you use stopifnot
> the assertion becomes somewhat clearer since its not muddied by the
> inversion and in my mind it starts to bring out a wider range of
> considerations such as whether a width
> with zero or fewer degrees of freedom should be allowed.  Perhaps
> the condition should be width > length(all.vars(formula)) or if
> formulae such as y~x-1 are allowed then a more complex specification.

I agree that stopifnot is the best construction for assertions.
Thanks for pointing that out.  It is one of the very useful utilities
added by Martin Maechler, who also wrote the str function which gets
my vote for the best debugging tool in R.

If you don't know the total number of coefficients then you would need
to assert only  width > 0.  If you do know the number of coefficients
then you can assert width >= ncoef.

> With these changes (except I've kept width > 0), movingWindow becomes
> 
> movingWindow <- function(formula, data, width, ...) {
>     nr <- nrow(data)
>     width <- as.integer(width)[1]
>     stopifnot( width > 0, width <= nr )
>     indices <- as.data.frame( t( embed( 1:nr, width ) ) )
>     lapply(indices, function(st) summary(lm(formula, data[st,])) )
> }
> 
> Similar comments could be applied to movingWindow2.

Adopting your suggestions I would change the movingWindow2 code to

movingWindow2 <- function(formula, data, width, ...) {
    mCall = match.call()
    mCall$width = NULL
    mCall[[1]] = as.name("lm")
    mCall$x = mCall$y = TRUE
    bigfit = eval(mCall, parent.frame())
    ncoef = length(coef(bigfit))
    nr = nrow(data)
    width = as.integer(width)[1]
    stopifnot(width >= ncoef, width <= nr)
    y = bigfit$y
    x = bigfit$x
    terms = bigfit$terms
    inds = embed(seq(nr), width)[, rev(seq(width))]
    sumrys <- lapply(seq(nrow(inds)),
                     function(st) {
                         ind = inds[st,]
                         fit = lm.fit(x[ind,], y[ind])
                         fit$terms = terms
                         class(fit) = "lm"
                         summary(fit)
                     })
    list(coefficients = sapply(sumrys, function(sm) coef(sm)[,"Estimate"]),
         Std.Error = sapply(sumrys, function(sm) coef(sm)[,"Std. Error"]),
         sigma = sapply(sumrys, "[[", "sigma"),
         r.squared = sapply(sumrys, "[[", "r.squared"))
}

for which the test gives

> data(women)
> movingWindow2(weight ~ height, women, 9)
$coefficients
                 [,1]       [,2]       [,3]      [,4]      [,5]        [,6]
(Intercept) -59.77778 -67.127778 -73.422222 -81.97222 -91.77778 -106.127778
height        3.00000   3.116667   3.216667   3.35000   3.50000    3.716667
                   [,7]
(Intercept) -122.955556
height         3.966667

$Std.Error
                  [,1]       [,2]       [,3]       [,4]      [,5]       [,6]
(Intercept) 3.77647036 2.64466036 3.70537766 4.71400546 5.1522157 6.60219186
height      0.06085806 0.04194352 0.05784947 0.07246601 0.0780042 0.09846709
                 [,7]
(Intercept) 7.7792788
height      0.1143188

$sigma
[1] 0.4714045 0.3248931 0.4481000 0.5613193 0.6042180 0.7627228 0.8855094

$r.squared
[1] 0.9971276 0.9987338 0.9977411 0.9967352 0.9965351 0.9951107 0.9942195