# [R] Why are lagged correlations typically negative?

Bliese, Paul D LTC USAMH paul.bliese at us.army.mil
Thu Aug 24 17:06:01 CEST 2006

```Recently, I was working with some lagged designs where a vector of
observations at one time was used to predict a vector of observations at
another time using a lag 1 design.  In the work, I noticed a lot of
negative correlations, so I ran a simple simulation with 2 matched
points.  The crude simulation example below shows that the correlation
can be -1 or +1, but interestingly if you do this basic simulation
thousands of times, you get negative correlations 66 to 67% of the time.
If you simulate three matched observations instead of three you get
negative correlations about 74% of the time and then as you simulate 4
and more observations the number of negative correlations asymptotically
approaches an equal 50% for negative versus positive correlations
(though then with 100 observations one has 54% negative correlations).
Creating T1 and T2 so they are related (and not correlated 1 as in the
crude simulation) attenuates the effect.  A more advanced simulation is
provided below for those interested.

Can anyone explain why this occurs in a way a non-mathematician is
likely to understand?

Thanks,

Paul

#############
# Crude simulation
#############
> (T1<-rnorm(3))
 -0.1594703 -1.3340677  0.2924988
> (T2<-c(T1[2:3],NA))
 -1.3340677  0.2924988         NA
> cor(T1,T2, use="complete")
 -1

> (T1<-rnorm(3))
 -0.84258593 -0.49161602  0.03805543
> (T2<-c(T1[2:3],NA))
 -0.49161602  0.03805543          NA
> cor(T1,T2, use="complete")
 1

###########
###########
> lags
function(nobs,nreps,rho=1){
OUT<-data.frame(NEG=rep(NA,nreps),COR=rep(NA,nreps))
nran<-nobs+1  #need to generate 1 more random number than there are
observations
for(i in 1:nreps){
V1<-rnorm(nran)
V2<-sqrt(1-rho^2)*rnorm(nran)+rho*V1
#print(cor(V1,V2))
V1<-V1[1:nran-1]
V2<-V2[2:nran]
OUT[i,1]<-ifelse(cor(V1,V2)<=0,1,0)
OUT[i,2]<-cor(V1,V2)
}
return(OUT) #out is a 1 if the corr is negative or 0; 0 if positive
}
> LAGS.2<-lags(2,10000)  #Number of observations matched = 2
> mean(LAGS.2)
NEG     COR
0.6682 -0.3364

```