[R] Autocorrelation in linear models

Thu Mar 17 00:48:11 CET 2011

I have been reading about autocorrelation in linear models over the last 
couple of days, and I have to say the more I read, the more confused I 
get. Beyond confusion lies enlightenment, so I'm tempted to ask R-Help for 
guidance.

Most authors are mainly worried about autocorrelation in the residuals, 
but some authors are also worried about autocorrelation within Y and 
within X vectors before any model is fitted. Would you test for 
autocorrelation both in the data and in the residuals?

If we limit our worries to the residuals, it looks like we have a variety 
of tests for lag=1:

   stats::cor.test(residuals(fm)[-n], residuals(fm)[-1])
   stats::Box.test(residuals(fm))
   lmtest::dwtest(fm, alternative="two.sided")
   lmtest::bgtest(fm, type="F")

In my model, a simple lm(y~x1+x2) with n=20 annual measurements, I have 
significant _positive_ autocorrelation within Y and within both X vectors, 
but _negative_ autocorrelation in the residuals. The residual 
autocorrelation is not quite significant, with the p-values

   0.070
   0.064
   0.125
   0.077

from the tests above. I seem to remember some authors saying that the 
Durbin-Watson test has less power than some alternative tests, as 
reflected here. The difference in p-values is substantial, so choosing 
which test to use could in many cases make a big difference for the 
subsequent analysis and conclusions. Most of them (cor.test, Box.test, 
bgtest) can also test lags>1. Which test would you recommend? I imagine 
the basic cor.test is somehow inappropriate for this; the other tests 
wouldn't have been invented otherwise, right?

The car::dwt(fm) has p-values fluctuating by a factor of 2, unless I run a 
very long simulation, which results in a p-value similar to 
lmtest::dwtest, at least in my case.

Finally, one question regarding remedies. If there was significant 
_positive_ autocorrelation in the residuals, some authors suggest 
remedying this by deflating the df (fewer effective df in the data) and 
redo the t-tests of the regression coefficients, rejecting fewer null 
hypotheses. Does that mean if the residuals are _negatively_ correlated 
then I should inflate the df (more effective df in the data) and reject 
more null hypotheses?

That's four question marks. I'd greatly appreciate guidance on any of 
them.

Thanks in advance,

Arni