# [R] generating correlated Bernoulli random variables

Bernhard Klingenberg Bernhard.Klingenberg at williams.edu
Tue Jul 3 14:37:29 CEST 2007

>
> > Hi all,
> > I was wondering how to generate samples for two RVs X1 and X2.
> >
> > X1 ~ Bernoulli (p1)
> > X2 ~ Bernoulli (p2)
> >
> > Also, X1 and X2 are correlated with correlation \rho.
>

You can use the rmvbin() function in the bindata package, e.g.,

require(bindata)
n=10
p1=0.5
p2=0.3
rho=0.2
rmvbin(n, c(p1,p2), bincorr=(1-rho)*diag(2)+rho)
?rmvbin

However, as pointed out before, rho is bounded below and above by some
function of the marginal probabilities. (Try above code with rho=0.9)
You may want to use the odds ratio (which is unrestricted) to specify
the association between the two binary variables and then convert this
odds ratio, for given marginal probabilities p1 and p2, into a (valid)
correlation rho to be used in rmvbin().

Here is some ad hoc code to do that:

bincorr <- function(OR, p1, p2) {    #from odds ratio to binary correlation
if (OR==1) p11=p2-p2+p1*p2 else {
p11_1=p2-(1/2/(1-OR)*(1-p1+OR*p1+p2-OR*p2-
sqrt((-1+p1-OR*p1-p2+OR*p2)^2-4*(1-OR)*(p2-p1*p2))))
p11_2=p2-(1/2/(1-OR)*(1-p1+OR*p1+p2-OR*p2-
sqrt((-1+p1-OR*p1-p2+OR*p2)^2)-4*(1-OR)*(p2-p1*p2)))
if (p11_1>0 && p11_1<=p1 && p11_1<p2) p11=p11_1 else p11=p11_2
}
bincorr=(p11-p1*p2)/sqrt(p1*(1-p1)*p2*(1-p2))
return(bincorr)
}

For instance, try

sapply(c(0,0.5,1,1.5,3,10,100),function(x) bincorr(x,p1,p2))

to see the range of valid correlations for odds ratios between 0 and
100, with p1 and p2 as above.

Bernhard Klingenberg
Dept. of Mathematics and Statistics
Williams College, MA
www.williams.edu/~bklingen