[R] Generating a binomial random variable correlated with a

Sun Apr 17 15:10:43 CEST 2005

On 17-Apr-05 Ted Harding wrote:
> [...]
> So I'd suggest experimenting on the following lines.
> 
> 1. Let X1 be a sample of size N using rbinom(N,1,p)
>    (where, in general, p need not be 0.5)
> 
> 2. Let Y be a sample of size N using rnorm(N,mu,sigma)
>    (and again, in general, mu need not be 0 nor sigma 1).
> 
> This is as in your example. So far, you have a true
> Binomial variate X1 and a true Normal variate Y.
> 
> 3. X1 will have r 1s in it, and (N-r) 0s. Now consider
>    setting up a correspondence between the 1s and 0s in
>    X1 and the values in Y, in such a way that the 1s
>    tend to get allocated to the higher values of Y and
>    the 0s to lower values of Y. The resulting set of
>    pairs (X1,Y) is then your correlated sample.
> 
>    In other words, permute the sample X1 and cbind the
>    result to Y.
> [...]
> Here is an example of a possible approach:
> 
>   X0 <- rbinom(100,1,0.5)
>   Y <- rnorm(100) ; Y<-sort(Y)
>   p0<-((1:100)-0.5)/100 ; p0<-p0/sum(p0); p<-cumsum(p0)
>   r<-sum(X0); ix<-sample((1:100),r,replace=FALSE,prob=p)
>   X1<-numeric(100); X1[ix]<-1;sum(X1)
>   ## [1] 50
>   cor(X1,Y)
>   ## [1] 0.6150937

Here is a follow-up (somewhat inspired by a remark in Bill's
suggestions). Consider the function

  test<-function(C,N){
    R<-numeric(N); corrs<-numeric(N)
    for(i in (1:N)){
      X0 <- rbinom(100,1,0.5) ; r<-sum(X0)
      Y <- rnorm(100) ; Y<-sort(Y)
      p0<-((1:100)-0.5)/100 ; L<-log(p0/(1-p0)); 
         p<-exp(C*L);p<-p/(1+p);
      ix<-sample((1:100),r,replace=FALSE,prob=p)
      X1<-numeric(100); X1[ix]<-1;
      R[i]<-r ; corrs[i]<-cor(X1,Y)
    }
    list(R=R,corrs=corrs)
  }

By experimenting with different values of C, you can see what
you get. For instance:

  mean(test(0.1,1000)$corrs)
  ## [1] 0.05958913

  mean(test(0.5,1000)$corrs)
  ## [1] 0.2722242

  mean(test(1.0,1000)$corrs)
  ## [1] 0.4347375

and finally (in my trials):

  mean(test(5.2,1000)$corrs)
  ## [1] 0.702623

  mean(test(5.22,10000)$corrs)
  ## [1] 0.6998803

and that might just be close enough to your desired 0.7 ...!

So now you have a true Binomial sample, an associated true
Normal sample, and a Pearson correlation of 0.7 between their
values.

Now of course comes the question which has been intriguing
us all: What is the purpose of achieving a given Pearson
correlation between a Normal variate and a binary variate?

Best wishes,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 17-Apr-05                                       Time: 14:10:43
------------------------------ XFMail ------------------------------