[R] Why are lagged correlations typically negative?

Thomas Lumley tlumley at u.washington.edu
Thu Aug 24 17:27:04 CEST 2006

On Thu, 24 Aug 2006, Bliese, Paul D LTC USAMH wrote:

> Recently, I was working with some lagged designs where a vector of
> observations at one time was used to predict a vector of observations at
> another time using a lag 1 design.  In the work, I noticed a lot of
> negative correlations, so I ran a simple simulation with 2 matched
> points.  The crude simulation example below shows that the correlation
> can be -1 or +1, but interestingly if you do this basic simulation
> thousands of times, you get negative correlations 66 to 67% of the time.
> If you simulate three matched observations instead of three you get
> negative correlations about 74% of the time and then as you simulate 4
> and more observations the number of negative correlations asymptotically
> approaches an equal 50% for negative versus positive correlations
> (though then with 100 observations one has 54% negative correlations).
> Creating T1 and T2 so they are related (and not correlated 1 as in the
> crude simulation) attenuates the effect.  A more advanced simulation is
> provided below for those interested.
> Can anyone explain why this occurs in a way a non-mathematician is
> likely to understand?

Consider the two points out of three case from the viewpoint of the middle 
point.  The correlation is positive if the previous point is lower and the 
following point is higher, or vice versa. It is negative if the previous 
and following points are both higher or both lower.

Now, if the middle point is higher than the first point it is probably 
higher than average, and so it has a more than 50% chance of also being 
higher than the third point.  Similarly, if it is lower than the first 
point it is likely to be lower than the third point.

So negative correlation is more likely than positive.

Working out the covariance may be useful even for non-mathematicians. Call 
the three points X,Y,Z

   cov(X-Y, Y-Z) = cov(X,Y)-cov(Y,Y)-cov(X,Z)+cov(Y,Z)
                 =    0    - var(Y) -    0   -    0


Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle

More information about the R-help mailing list