[R] Generating an autocorrelated binary variable

Fri Sep 28 05:02:19 CEST 2012

I have no idea what your code is doing, nor why you want correlated binary
variables.  Correlation makes little or no sense in the context of 
binary random
variables --- or more generally in the context of discrete random variables.

Be that as it may, it is an easy calculation to show that if X and Y are 
binary
random variables both with success probability of 0.5 then cor(X,Y) = 0.2 if
and only if Pr(X=1 | Y = 1) = 0.6.  So just generate X and Y using that 
fact:

set.seed(42)
X <- numeric(1000)
Y <- numeric(1000)
for(i in 1:1000) {
    Y[i] <- rbinom(1,1,0.5)
    X[i] <- if(Y[i]==1) rbinom(1,1,0.6) else rbinom(1,1,0.4)
}

# Check:
cor(X,Y) # Get 0.2012336

Looks about right.  Note that the sample proportions are 0.484 and
0.485 for X and Y respectively.  These values do not differ significantly
from 0.5.

     cheers,

             Rolf Turner

On 28/09/12 08:26, Simon Zehnder wrote:
> Hi R-fellows,
>
> I am trying to simulate a multivariate correlated sample via the Gaussian copula method. One variable is a binary variable, that should be autocorrelated. The autocorrelation should be rho = 0.2. Furthermore, the overall probability to get either outcome of the binary variable should be 0.5.
> Below you can see the R code (I use for simplicity a diagonal matrix in rmvnorm even if it produces no correlated sample):
>
> "sampleCop" <- function(n = 1000, rho = 0.2) {
> 	
> 	require(splus2R)
> 	mvrs <- rmvnorm(n + 1, mean = rep(0, 3), cov = diag(3))
> 	pmvrs <- pnorm(mvrs, 0, 1)
> 	var1 <- matrix(0, nrow = n + 1, ncol = 1)
> 	var1[1] <- qbinom(pmvrs[1, 1], 1, 0.5)
> 	if(var1[1] == 0) var1[nrow(mvrs)] <- -1
> 	for(i in  1:(nrow(pmvrs) - 1)) {
> 		if(pmvrs[i + 1, 1] <= rho) var1[i + 1] <- var1[i]
> 		else var1[i + 1] <- var1[i] * (-1)
> 	}
> 	sample <- matrix(0, nrow = n, ncol = 4)
> 	sample[, 1] <- var1[1:nrow(var1) - 1]
> 	sample[, 2] <- var1[2:nrow(var1)]
> 	sample[, 3] <- qnorm(pmvrs[1:nrow(var1) - 1, 2], 0, 1, 1, 0)
> 	sample[, 4] <- qnorm(pmvrs[1:nrow(var1) - 1, 3], 0, 1, 1, 0)
> 	
> 	sample
> 	
> }
>
> Now, the code is fine, everything compiles. But when I compute the autocorrelation of the binary variable, it is not 0.2, but 0.6. Does anyone know why this happens?