[R] ccf (cross correlation function) problems

Rolf Turner rolf.turner at xtra.co.nz
Tue Jan 29 23:04:46 CET 2013


Your question and your English are just fine!

If I were you, I would not mess around with the ccf() function but
would attack the question "directly" using the cor.test() function, with
sub-vectors of your x vector. Personally I find the notion of "lag" in acf()
and ccf() highly confusing and I always make "parity errors" --- i.e. I get
things backwards!

Moreover, the ccf() function is throwing information away; it truncates
the x vector to have the same length as y, i.e. 21, and so never uses
x[22:29] --- which have useful content in respect of lags less than 8.
You haven't a lot of data, so it is prudent not to be wasteful.

What I would do:

OP <- par(mfrow=c(3,3))
for(i in 1:9) {
CT <- cor.test(x[i:(20+i)],y,alternative="less")
PV <- CT$p.value
cat("lag =",9-i,"p-value =",PV,"\n")
COR <- sprintf("%1.3f",CT$estimate)
plot(x[i:(20+i)],y,xlab="x",main=paste("lag =",9-i,"corr =",COR))
}
par(OP)

HTH

cheers,

Rolf Turner

On 01/29/2013 11:26 PM, Larissa Modica wrote:
> Hello everybody,
>
> I am sorry if my questions are too simple or not easily understandable. I’m
> not  a native English speaker and this is my first analysis using this
> function.
>
> I have a problem with a cross correlation function and I would like to
> understand how I have to perform it in R.
>
> I have yearly data of an independent variable (x) from 1982 to 2010, and I
> also have yearly data of a variable (y)from 1990 to 2010.
>
> I think y could be influenced by the variable (x) with a delay of 6 years.
>
> When I plot the data of x from 1986 to 2006 against the data of y from 1990
> to 2010, the graphic has a opposite trend, i.e. when the variable x was
> high in the 1986, the variable y was low in 1990 and so on until the end of
> the time series.
>
> Consequently I aspect that the two time series are correlated with a
> negative correlation value.
>
>   Namely:
>
> Yyear=f(xyear-Lag).
>
> And corr has a negative value.
>
> I write here the script I have performed in R.
>
> a)
>
>
>
> x<-c(105.3381,126.2792,121.7298,110.35,133.1647,140.5724,183.8853,177.0154,181.2147,186.4154,209.6958,205.029
> 2,184.9683,
>
> 222.9683,219.8538,268.1029,249.1545,228.942,198.2119,171.0913,146.346,166.3192,163.5747,173.3394,180.7952,176.8276,159.7074,150.6029,110.9653)
>
> y<-c(32.93415,45.75486,29.36993,23.70824,21.30857,19.78977,16.88913,22.25963,19.32558,19.73704,22.62746,28.90173,27.66794,
>
> 26.23163,28.69109,22.04674,26.47496,33.03602,41.62231,28.96627,31.80892)
>
> x<-ts(x)
>
> y<-ts(y)
>
> dumb<-ccf( x,y, ylab = "cross-correlation",  xlab = "Time lag", main = "y
> influenced by x")
>
> dumb
>
>
>
> Autocorrelations of series ‘X’, by lag
>
>
>
>     -10     -9     -8     -7     -6     -5     -4     -3     -2     -1
>
>   0.083  0.133  0.253  0.323  0.386  0.515  0.544  0.609  0.448  0.118
>
>
>
> 0      1      2      3      4      5      6      7      8      9
>
> -0.154 -0.283 -0.416 -0.326 -0.265 -0.217 -0.285 -0.340 -0.315 -0.254
>
>
>
> 10
>
> -
>
> 0.188
>
>
>
> My question is:
>
> Is the script correct to ask the question I need to answer?
>
> X and y have to heve the same length (i.e. I have to consider the same
> number of years)?
>
> What does this result means?
>
> My interpretation is: the higher correlation was a lag of -3 years.
>
> It means that what happened to “x” variable in 1987 influenced “y” in 1990?
>
>
>
>
>
> Also, if it was not correct, is correct to write:
>
> b)
>
> c(105.3381,126.2792,121.7298,110.35,133.1647,140.5724,183.8853,177.0154,181.2147,186.4154,209.6958,205.029
> 2,184.9683,
>
> 222.9683,219.8538,268.1029,249.1545,228.942,198.2119,171.0913,146.346,166.3192,163.5747,173.3394,180.7952,176.8276,159.7074,150.6029,110.9653)
>
> y<-c(32.93415,45.75486,29.36993,23.70824,21.30857,19.78977,16.88913,22.25963,19.32558,19.73704,22.62746,28.90173,27.66794,
>
> 26.23163,28.69109,22.04674,26.47496,33.03602,41.62231,28.96627,31.80892)
>
> x<-ts(x)
>
> y<-ts(y)
>
> dumb<-ccf( x[3:23],y, ylab = "cross-correlation",  xlab = "Time lag", main
> = "y influenced by x")
>
>
>
> dumb
>
>
>
> Autocorrelations of series ‘X’, by lag
>
>
>
>     -10     -9     -8     -7     -6     -5     -4     -3     -2     -1
>
>   0.104  0.221  0.257  0.393  0.478  0.601  0.517  0.406  0.087 -0.270
>
>
>
> 0      1      2      3      4      5      6      7      8      9
>
> -0.481 -0.397 -0.344 -0.241 -0.284 -0.349 -0.337 -0.265 -0.198 -0.161
>
>
>
> 10
>
> 0.044
>
>
>
> As I understand this results mean that the higher correlation is observed
> when the lag =0. That means a difference of 6 years that I set up when I
> wrote x[3:23] that simply means work with years from 1984 to 2004.
>
>
>
> In summary I would like to know:
>
> 1) if the analysis is correct in the way a) or in the way b)
>
> 2) if there is another way to demonstrate that the variable x have an
> influence on the variable y with a delay of 6 years.
>
>
>
> Thank very much to anybody  who could help me.



More information about the R-help mailing list